Home » Linux » Write a C program to measure time spent in context switch in Linux OS

Write a C program to measure time spent in context switch in Linux OS

Posted by: admin November 30, 2017 Leave a comment


Can we write a c program to find out time spent in context switch in Linux? Could you please share code if you have one?


Profiling the switching time is very difficult, but the in-kernel latency profiling tools, as well as oprofile (which can profile the kernel itself) will help you there.

For benchmarking the interactive application performance, I have written a small tool called latencybench that measures unexpected latency spikes:

// Compile with g++ latencybench.cc -o latencybench -lboost_thread-mt
// Should also work on MSVC and other platforms supported by Boost.

#include <boost/format.hpp>
#include <boost/thread/thread.hpp>
#include <boost/date_time.hpp>
#include <algorithm>
#include <cstdlib>
#include <csignal>

volatile bool m_quit = false;

extern "C" void sighandler(int) {
    m_quit = true;

std::string num(unsigned val) {
    if (val == 1) return "one occurrence";
    return boost::lexical_cast<std::string>(val) + " occurrences";

int main(int argc, char** argv) {
    using namespace boost::posix_time;
    std::signal(SIGINT, sighandler);
    std::signal(SIGTERM, sighandler);
    time_duration duration = milliseconds(10);
    if (argc > 1) {
        try {
            if (argc != 2) throw 1;
            unsigned ms = boost::lexical_cast<unsigned>(argv[1]);
            if (ms > 1000) throw 2;
            duration = milliseconds(ms);
        } catch (...) {
            std::cerr << "Usage: " << argv[0] << " milliseconds" << std::endl;
            return EXIT_FAILURE;
    typedef std::map<long, unsigned> Durations;
    Durations durations;
    unsigned samples = 0, wrongsamples = 0;
    unsigned max = 0;
    long last = -1;
    std::cout << "Measuring actual sleep delays when requesting " << duration.total_milliseconds() << " ms: (Ctrl+C when done)" << std::endl;
    ptime begin = boost::get_system_time();
    while (!m_quit) {
        ptime start = boost::get_system_time();
        boost::this_thread::sleep(start + duration);
        long actual = (boost::get_system_time() - start).total_milliseconds();
        unsigned num = ++durations[actual];
        if (actual != last) {
            std::cout << "\r  " << actual << " ms " << std::flush;
            last = actual;
        if (actual != duration.total_milliseconds()) {
            if (num > max) max = num;
            std::cout << "spike at " << start - begin << std::endl;
            last = -1;
    if (samples == 0) return 0;
    std::cout << "\rTotal measurement duration:  " << boost::get_system_time() - begin << "\n";
    std::cout << "Number of samples collected: " << samples << "\n";
    std::cout << "Incorrect delay count:       " << wrongsamples << boost::format(" (%.2f %%)") % (100.0 * wrongsamples / samples) << "\n\n";
    std::cout << "Histogram of actual delays:\n\n";
    unsigned correctsamples = samples - wrongsamples;
    const unsigned line = 60;
    double scale = 1.0;
    char ch = '+';
    if (max > line) {
        scale = double(line) / max;
        ch = '*';
    double correctscale = 1.0;
    if (correctsamples > line) correctscale = double(line) / correctsamples;
    for (Durations::const_iterator it = durations.begin(); it != durations.end(); ++it) {
        std::string bar;
        if (it->first == duration.total_milliseconds()) bar = std::string(correctscale * it->second, '>');
        else bar = std::string(scale * it->second, ch);
        std::cout << boost::format("%5d ms | %s %d") % it->first % bar % it->second << std::endl;
    std::cout << "\n";
    std::string indent(30, ' ');
    std::cout << indent << "+-- Legend ----------------------------------\n";
    std::cout << indent << "|  >  " << num(1.0 / correctscale) << " (of " << duration.total_milliseconds() << " ms delay)\n";
    if (wrongsamples > 0) std::cout << indent << "|  " << ch << "  " << num(1.0 / scale) << " (of any other delay)\n";

Results on Ubuntu 2.6.32-14-generic kernel. While measuring, I was compiling C++ code with four cores and playing a game with OpenGL graphics at the same time (to make it more interesting):

Total measurement duration:  00:01:45.191465
Number of samples collected: 10383
Incorrect delay count:       196 (1.89 %)

Histogram of actual delays:

   10 ms | >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 10187
   11 ms | *************************************************** 70
   12 ms | ************************************************************ 82
   13 ms | ********* 13
   14 ms | ********* 13
   15 ms | ** 4
   17 ms | *** 5
   18 ms | * 2
   19 ms | **** 6
   20 ms |  1

                              +-- Legend ----------------------------------
                              |  >  169 occurrences (of 10 ms delay)
                              |  *  one occurrence (of any other delay)

With rt-patched kernels I get much better results, pretty much 10-12 ms only.

The legend in the printout appears to be suffering of a rounding error or something (and the source code pasted is not the exact same version). I never really polished this application for a release…


If you have superuser privileges, you can run a SystemTap program with probe points for context switches and print the current time at each one:

probe scheduler.ctxswitch {
    printf("Switch from %d to %d at %d\n", prev_pid, next_pid, gettimeofday_us())

I’m not sure how reliable the output data is, but it’s a quick and easy way to get some numbers.


What do you think , measuring the context switching with seconds or milliseconds or even microseconds . All happening less than nano-sec . If your want to spend that huge of time for context switching which could be measured ,then …
Try some real-mode kernel type code written on Assembly , you might see something.


Short answer – no. Long answer bellow.

Context switch roughly happens when either:

  1. User process enters the kernel via system call or a trap (e.g. page fault) and requested data (e.g. file contents) is not yet available, so the kernel puts said user process into sleep state and switches to another runnable process.
  2. Kernel detects that given user process consumed its full time quanta (this happens in code invoked from timer interrupt.)
  3. Data becomes available for higher current priority process that is presently sleeping (this happens from code invoked from/around IO interrupts.)

The switch itself is one-way, so the best we can do in userland (I assume that’s what you are asking) is to measure sort of an RTT, from our process to another and back. The other process also takes time to do its work. We can of course make two or more processes cooperate on this, but the thing is that the kernel doesn’t guarantee that one of our processes is going to be picked next. It’s probably possible to predictably switch to a given process with RT scheduler, but I have no advise here, suggestions welcome.


Measuring the cost of a context switch is a little trickier. We can compute time spent in context switch by running two processes on a single CPU, and setting up three Linux pipes between them;

  • two pipes for sharing string between the process and
  • 3rd one will be used to share a time spent at child process.

The first process then issues a write to the first pipe, and waits for a read on the second; upon seeing the first process waiting for something to read from the second pipe, the OS puts the first process in the blocked state, and switches to the other process, which reads from the first pipe and then writes to the second. When the second process tries to read from the first pipe again, it blocks, and thus the back-and-forth cycle of communication continues. By measuring the cost of communicating like this repeatedly, you can make a good estimate of the cost of a context switch.

One difficulty in measuring context-switch cost arises in systems with more than one CPU; what you need to do on such a system is ensure that your context-switching processes are located on the same processor
. Fortunately, most operating systems have calls to bind a process to a particular processor; on Linux, for example, the sched_setaffinity()call is what you’re looking for. By ensuring both processes are on the same processor, you are making sure to measure the cost of the OS stopping one process and restoring another on the same CPU.

Here I’m posting my solution for computing context-switch between processes.

    #define _GNU_SOURCE
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
#include <sched.h>
#include <stdlib.h>
#include <string.h>
#include <linux/unistd.h>
#include <sys/time.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <errno.h>

pid_t getpid( void )
    return syscall( __NR_getpid );

int main()
        To make sure context-switching processes are located on the same processor :
        1. Bind a process to a particular processor using sched_setaffinity.    
        2. To get the maximum priority value (sched_get_priority_max) that can be used with 
           the scheduling algorithm identified by policy (SCHED_FIFO).** 

    cpu_set_t set;
    struct sched_param prio_param;
    int prio_max;

    CPU_ZERO( &set );
    CPU_SET( 0, &set );
        memset(&prio_param,0,sizeof(struct sched_param));

    if (sched_setaffinity( getpid(), sizeof( cpu_set_t ), &set ))
        perror( "sched_setaffinity" );

    if( (prio_max = sched_get_priority_max(SCHED_FIFO)) < 0 )

    prio_param.sched_priority = prio_max;
    if( sched_setscheduler(getpid(),SCHED_FIFO,&prio_param) < 0 )

        1. To create a pipe for a fork, the parent and child processes use pipe to read and write, 
           read and write string, using this for context switch.
        2. The parent process first to get the current timestamp (gettimeofday), then write to the pipe,. 
           Then the child should be read in from the back, 
           then the child process to write string, the parent process reads. 
           After the child process to get the current timestamp. 
           This is roughly the difference between two timestamps n * 2 times the context switch time.

    int     ret=-1;
    int     firstpipe[2];
    int     secondpipe[2];
    int     timepipe[2];
        int     nbytes;
        char    string[] = "Hello, world!\n";
        char    temp[] = "Sumit Gemini!\n";
        char    readbuffer[80];
        char    tempbuffer[80];
    int     i=0;
    struct  timeval start,end;

    // Create an unnamed first pipe
        if (pipe(firstpipe) == -1) 
            fprintf(stderr, "parent: Failed to create pipe\n");
            return -1;

    // create an unnamed Second pipe
        if (pipe(secondpipe) == -1) 
            fprintf(stderr, "parent: Failed to create second pipe\n");
            return -1;

    // Create an unnamed time pipe which will share in order to show time spend between processes
        if (pipe(timepipe) == -1) 
            fprintf(stderr, "parent: Failed to create time pipe\n");
            return -1;

    else if(ret==0)
                int n=-1;
        printf("Child  ----> %d\n",getpid());

                    nbytes = read(firstpipe[0], readbuffer, sizeof(readbuffer));
                    printf("Received string: %s", readbuffer);
            write(secondpipe[1], temp, strlen(temp)+1);

                n = sizeof(struct timeval);

                if( write(timepipe[1],&end,sizeof(struct timeval)) != n )
                fprintf(stderr, "child: Failed to write in time pipe\n");

        double switch_time;
                int n=-1;
        printf("Parent  ----> %d\n",getpid());
                /* Read in a string from the pipe */

            write(firstpipe[1], string, strlen(string)+1);
            read(secondpipe[0], tempbuffer, sizeof(tempbuffer));
                    printf("Received temp: %s", tempbuffer);

        n = sizeof(struct timeval);
                if( read(timepipe[0],&end,sizeof(struct timeval)) != n )
                fprintf(stderr, "Parent: Failed to read from time pipe\n");

        switch_time = ((end.tv_sec-start.tv_sec)*1000000+(end.tv_usec-start.tv_usec))/1000.0;
                printf("context switch between two processes: %0.6lfms\n",switch_time/(5*2));


    return 0;


Why not just this one as rough estimation?

#include <ctime>
#include <cstdio>
#include <sys/time.h>
#include <unistd.h>

int main(int argc, char **argv) {
        struct timeval tv, tvt;
        int diff;
        gettimeofday(&tv, 0);
        diff = tvt.tv_usec - tv.tv_usec;
        if (fork() != 0) {
                gettimeofday(&tvt, 0);
                diff = tvt.tv_usec - tv.tv_usec;
                printf("%d\n", diff);
        return 0;

Note: Actually we shouldn’t put null as the second argument, check man gettimeofday. Also, we should check if tvt.tv_usec > tv.tv_usec! Just a draft.