2

Somebody can explain why timer fail with SIGSEGV after 5-7 iteration?

It happens in both cases: with synchronization and without. Operating system is Ubuntu 15.04, Ubuntu GLIBC 2.21-0ubuntu4.

void timer_thread (sigval signal_value) {
    printf ("Timer callback!\n");
}

int main(int argc, char* argv[]) {

    const int TIMER_COUNT = 300;

    for (int i = 0; i < 10000; i++) {
        int status = 0;

        timer_t timer_id[TIMER_COUNT] = {};
        memset(&timer_id[0], 0, sizeof(timer_t)*TIMER_COUNT);

        for (int j = 0; j < TIMER_COUNT; j++) {

            struct itimerspec ts = {};
            struct sigevent se = {};

            memset(&ts, 0, sizeof(itimerspec));
            memset(&se, 0, sizeof(sigevent));

            se.sigev_notify = SIGEV_THREAD;
            se.sigev_value.sival_int = j;
            se.sigev_notify_function = timer_thread;

            // Specify a repeating timer that fires each 100000 nanosec.
            memset(&ts, 0, sizeof(ts));
            ts.it_value.tv_nsec = 100000;
            ts.it_interval.tv_nsec = 100000;

            status = timer_create(CLOCK_REALTIME, &se, &timer_id[j]);
            assert(!status && "Create timer");

            status = timer_settime(timer_id[j], 0, &ts, 0);
            assert(!status && "Set timer");
        }

        for (int j = 0; j < TIMER_COUNT; j++) {
            usleep(100);
            //stop and delete

            status = timer_delete(timer_id[j]);
            assert(!status && "Fail delete timer");
        }
    }
    printf("Success!\n");
    return 0;
}

GDB back trace:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  __pthread_create_2_1 (newthread=newthread@entry=0x7f00e9817e28, attr=attr@entry=0x11c47e8, start_routine=start_routine@entry=0x7f00e93f6eb0 <timer_sigev_thread>, arg=<optimized out>)
    at pthread_create.c:711
711 pthread_create.c: No such file or directory.
(gdb) bt
#0  __pthread_create_2_1 (newthread=newthread@entry=0x7f00e9817e28, attr=attr@entry=0x11c47e8, start_routine=start_routine@entry=0x7f00e93f6eb0 <timer_sigev_thread>, arg=<optimized out>)
    at pthread_create.c:711
#1  0x00007f00e93f6e7a in timer_helper_thread (arg=<optimized out>) at ../sysdeps/unix/sysv/linux/timer_routines.c:125
#2  0x00007f00e91db6aa in start_thread (arg=0x7f00e9818740) at pthread_create.c:333
#3  0x00007f00e866feed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Build command-line: /usr/bin/c++ -lrt -lpthread -g ./main.cc

Full code posix timer with synchronization

Full code posix timer with sleep

xaizek
  • 5,098
  • 1
  • 34
  • 60
  • Please confirm that you are continually creating, timing-out and then deleting 300 timers. Over, and over, and over again. – Martin James Mar 03 '16 at 11:37
  • Right. I've created about 300 timers, then sleep or wait condition variable and after all timers call more one time, delete all timers. About 5 time is work, then on 6th or later it will fail. – Антон Грицевич Mar 03 '16 at 13:52
  • And no need 300 timers, it fails with 50 or less timers. – Антон Грицевич Mar 03 '16 at 15:35
  • This seems to be a runtime question, but the posted code is missing the #include statements. Do you expect us to guess which header files are being included? – user3629249 Mar 04 '16 at 16:21
  • I don't repro this problem. I'm running Mint 17.2 x86_64 kernel v3.16.0 in a VM (which I'd have guessed would make the problem worse). Try running this in another terminal window to see what values you get for the "signal queue limit": `cat /proc/[programs-PID]/status | grep SigQ`. I get results like: `SigQ: 271/22308`. The first value is the number of signals queued up and the second is the maximum. Here the first never exceeds 300 (what you'd expect, since at most 300 timers are active at a time). See if you get any higher values in the first entry or a really low value in the second. – Michael Burr Mar 05 '16 at 08:27
  • Oh, and in case it matters, my glibc version is: `GNU C Library (Ubuntu EGLIBC 2.19-0ubuntu6.6) stable release version 2.19` – Michael Burr Mar 05 '16 at 08:32

1 Answers1

0

the following code actually runs, does not crash, cleanly compiles

Notice the expanded time interval for each timer, this is so some 300 timers have time to call printf() and return.

BTW: calling printf() in a signal handler is a very bad idea

#include <stdio.h>  // printf()
#include <stdlib.h> // exit(), EXIT_FAILURE
#include <signal.h>
#include <time.h>
#include <unistd.h> // sleep()
#include <string.h> // memset()

#define TIMER_COUNT (300)

#define errExit(msg)    do { perror(msg); exit(EXIT_FAILURE); \
                           } while (0)

void timer_thread (union sigval sigev_value)
{
    (void)sigev_value;
    static int count = 0;
    count++;
    fprintf ( stdout, "Timer callback count: %d\n", count);
}

int main( void )
{
    timer_t timer_id[TIMER_COUNT];
    memset(timer_id, '\0', sizeof(timer_t)*TIMER_COUNT);

    struct itimerspec ts;
    struct sigevent se;

    for (size_t j = 0; j < TIMER_COUNT; j++)
    {
        memset(&ts, '\0', sizeof(struct itimerspec));
        memset(&se, '\0', sizeof(struct sigevent));

        se.sigev_notify = SIGEV_THREAD;
        se.sigev_value.sival_int = (int)j;
        se.sigev_notify_function = timer_thread;

        // Specify a repeating timer that fires each 2 second.
        ts.it_interval.tv_sec = 2;
        ts.it_interval.tv_nsec = 0;
        ts.it_value.tv_sec = ts.it_interval.tv_sec;
        ts.it_value.tv_nsec = ts.it_interval.tv_nsec;

        if( -1 == timer_create(CLOCK_REALTIME, &se, &timer_id[j]) )
            errExit( "timer_create failed" );

        if( -1 == timer_settime(timer_id[j], 0, &ts, NULL) )
            errExit("timer_settime failed");
    }

    sleep(10);

    for (int j = 0; j < TIMER_COUNT; j++)
    {
        //stop and delete
        fprintf( stdout, "stopping timer: %d, with ID:  %lu\n", j, (size_t)timer_id[j]);
        ts.it_value.tv_sec = 0;
        ts.it_value.tv_nsec = 0;
        ts.it_interval.tv_sec = 0;
        ts.it_interval.tv_nsec = 0;
        if( -1 == timer_settime( timer_id[j], 0, &ts, NULL) )
            errExit("timer_settime (to disable timer) failed" );

        fprintf( stdout, "deleting timer: %d, with ID:  %lu\n", j, (size_t)timer_id[j]);
        if( -1 == timer_delete(timer_id[j]) )
            errExit("timer_delete failed" );
    }

    printf("Success!\n");
    return 0;
} // end function: main
user3629249
  • 16,402
  • 1
  • 16
  • 17
  • I agree that calling `printf()` in the callback for hundreds of 100us timers is asking for some trouble, but the `SIGEV_THREAD` configuration specified in the `se.sigev_notify` item means that the timer callback isn't done in a signal handler - it's done in a thread. The thread could be created specifically for the callback or it could be a worker thread in a thread pool. – Michael Burr Mar 05 '16 at 05:50
  • I ran the above code, with a modification so it would iterate the 10000 times as in the OPs posted code. It ran (forever it seemed) but it did not crash. (ubuntu linux 14.04) – user3629249 Mar 05 '16 at 09:20
  • Notice I slowed down the operation rate, (2second timer interval and 10 seconds before starting to delete timers) I have not ran it with 10000usec timers and no waiting before killing the timers. You can test that scenario. However, I suspect some OS queue is being overrun when trying to run too fast (mostly due to the call to printf() in the callback function. – user3629249 Mar 05 '16 at 09:25
  • I also suspected that some queue was being overrun with the callback performing a `printf()`. But when I try to reproduce the OP's problem on my system (with 100usec timers) I don't see the app crash on my system - it all seems to work. I don't know what to conclude from that. – Michael Burr Mar 05 '16 at 18:33
  • I modified the code to have the same timings and loop count as the OPs posted code. With a amd 4 core CPU and 8gigs of RAM. all 4 cores are running around 60percent and memory (no swap space being used although 8 gig allocated) ran at nearly 80 percent utilization. If the OP's computer is a less well endowed machine, then it probably was running the CPU(s) at 100 percent and/or memory at 100 percent. This would result in the user interface being unresponsive. Resulting in looking like the application had crashed. – user3629249 Mar 05 '16 at 20:16
  • I also noticed, as the application progressed (it has now been 20 minutes) that the memory utilization is slowly increasing (now at 90 percent). That increasing memory utilization is probably the backlog of messages being printed – user3629249 Mar 05 '16 at 20:19
  • CPU utilization seems to be maintaining around 60 percent, but memory utilization is maintaining around 85 percent and swap is using ~1gig and slowly rising – user3629249 Mar 05 '16 at 20:26
  • terminal output is running way way way behind. memory utilization now over 90percent and swap utilization over 1.5gig and rising. It is now obvious that my computer cannot keep up with the timer events, (and as it falls further and further behind, the timers are expiring more and more, resulting in a steady increase in the number of backlogged calls to timer_thread function. Even in this short time, swap utilization has risen to over 2gig and still climbing. Before it crashes my OS, I'm going to kill the application. – user3629249 Mar 05 '16 at 20:36
  • after killling the application, memory utilization dropped to .77gig and swap dropped to .359gig. Normally, swap runs at 0 and memory utilization at less than .5gig, so all that mass of backlogged timer events and printf() output did not properly cleanup. (ubuntu linux 14.04) when the process killed – user3629249 Mar 05 '16 at 20:43
  • On my system your code fail too. Possible it depends on glibc version. – Антон Грицевич Mar 14 '16 at 15:59
  • I'm using the latest glib from the automatic updates for my OS. ubuntu linux 14.04.4 What are you using? – user3629249 Mar 14 '16 at 20:55