1

I wrote a C program that sometimes dies after a few days. It runs on embedded equipment so it's difficult to debug the problem properly (no local gdb, no valgrind, but I have strace). It does not generate a core file when it dies, even though ulimit -c unlimited is used.

When it dies, all that is being displayed on the console is 'killed'. Logs from the program itself do not help. I suspect either a buffer overflow, a memory overflow (a missing free) or a multithreading issue.

I do not use a signal handler in the code (could that help ?). Where does this kill -9 come from ?!?

I've tried the following:

$ ./MyProg
killed

$ time -v ./MyProg
    Command terminated by signal 9
Command being timed: "./MyProg"
User time (seconds): 762.04
System time (seconds): 1360.74
Percent of CPU this job got: 2%
Elapsed (wall clock) time (h:mm:ss or m:ss): 23h 4m 23s
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 0
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 40
Minor (reclaiming a frame) page faults: 29567
Voluntary context switches: 4742276
Involuntary context switches: 187702
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0

$ cat /proc/PID/smaps # Shortly before the crash
10000000-10042000 r-xp 00000000 00:0b 203162716  /root/MyProg
Size:                264 kB
Rss:                 224 kB
Pss:                 224 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:       224 kB
Private_Dirty:         0 kB
Referenced:          120 kB
10052000-10055000 rwxp 00042000 00:0b 203162716  /root/MyProg
Size:                 12 kB
Rss:                  12 kB
Pss:                  12 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:        12 kB
Referenced:           12 kB
10055000-1706a000 rwxp 10055000 00:00 0          [heap]
Size:             114772 kB
Rss:              114716 kB
Pss:              114716 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:    114716 kB
Referenced:       114716 kB
30000000-30005000 r-xp 00000000 00:0b 135513112  /lib/ld-uClibc-0.9.29.so
Size:                 20 kB
Rss:                  20 kB
Pss:                   1 kB
Shared_Clean:         20 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:           20 kB
30005000-30006000 rw-p 30005000 00:00 0 
Size:                  4 kB
Rss:                   4 kB
Pss:                   4 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         4 kB
Referenced:            4 kB
30014000-30015000 r--p 00004000 00:0b 135513112  /lib/ld-uClibc-0.9.29.so
Size:                  4 kB
Rss:                   4 kB
Pss:                   4 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         4 kB
Referenced:            4 kB
30015000-30016000 rwxp 00005000 00:0b 135513112  /lib/ld-uClibc-0.9.29.so
Size:                  4 kB
Rss:                   4 kB
Pss:                   4 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         4 kB
Referenced:            4 kB
30016000-30027000 r-xp 00000000 00:0b 135513124  /lib/libm-0.9.29.so
Size:                 68 kB
Rss:                  12 kB
Pss:                   1 kB
Shared_Clean:         12 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:           12 kB
30027000-30036000 ---p 30027000 00:00 0 
Size:                 60 kB
Rss:                   0 kB
Pss:                   0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
30036000-30037000 r--p 00010000 00:0b 135513124  /lib/libm-0.9.29.so
Size:                  4 kB
Rss:                   4 kB
Pss:                   0 kB
Shared_Clean:          4 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            4 kB
30037000-30038000 rwxp 00011000 00:0b 135513124  /lib/libm-0.9.29.so
Size:                  4 kB
Rss:                   4 kB
Pss:                   4 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         4 kB
Referenced:            4 kB
30038000-30043000 r-xp 00000000 00:0b 135513129  /lib/libpthread-0.9.29.so
Size:                 44 kB
Rss:                  44 kB
Pss:                  44 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:        44 kB
Private_Dirty:         0 kB
Referenced:           20 kB
30043000-30052000 ---p 30043000 00:00 0 
Size:                 60 kB
Rss:                   0 kB
Pss:                   0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
30052000-30053000 r--p 0000a000 00:0b 135513129  /lib/libpthread-0.9.29.so
Size:                  4 kB
Rss:                   4 kB
Pss:                   4 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         4 kB
Referenced:            4 kB
30053000-30058000 rwxp 0000b000 00:0b 135513129  /lib/libpthread-0.9.29.so
Size:                 20 kB
Rss:                   8 kB
Pss:                   8 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         8 kB
Referenced:            8 kB
30058000-3005a000 rwxp 30058000 00:00 0 
Size:                  8 kB
Rss:                   0 kB
Pss:                   0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
3005a000-3005b000 r-xp 00000000 00:0b 135513131  /lib/librt-0.9.29.so
Size:                  4 kB
Rss:                   4 kB
Pss:                   2 kB
Shared_Clean:          4 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
3005b000-3006a000 ---p 3005b000 00:00 0 
Size:                 60 kB
Rss:                   0 kB
Pss:                   0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
3006a000-3006b000 r--p 00000000 00:0b 135513131  /lib/librt-0.9.29.so
Size:                  4 kB
Rss:                   4 kB
Pss:                   2 kB
Shared_Clean:          4 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
3006b000-3006c000 rwxp 00001000 00:0b 135513131  /lib/librt-0.9.29.so
Size:                  4 kB
Rss:                   4 kB
Pss:                   4 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         4 kB
Referenced:            4 kB
3006c000-30079000 r-xp 00000000 00:0b 135513120  /lib/libgcc_s.so.1
Size:                 52 kB
Rss:                  28 kB
Pss:                  21 kB
Shared_Clean:          8 kB
Shared_Dirty:          0 kB
Private_Clean:        20 kB
Private_Dirty:         0 kB
Referenced:           20 kB
30079000-30088000 ---p 30079000 00:00 0 
Size:                 60 kB
Rss:                   0 kB
Pss:                   0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
30088000-30089000 rwxp 0000c000 00:0b 135513120  /lib/libgcc_s.so.1
Size:                  4 kB
Rss:                   4 kB
Pss:                   4 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         4 kB
Referenced:            4 kB
30089000-300d0000 r-xp 00000000 00:0b 135513132  /lib/libuClibc-0.9.29.so
Size:                284 kB
Rss:                 188 kB
Pss:                  22 kB
Shared_Clean:        180 kB
Shared_Dirty:          0 kB
Private_Clean:         8 kB
Private_Dirty:         0 kB
Referenced:          164 kB
300d0000-300df000 ---p 300d0000 00:00 0 
Size:                 60 kB
Rss:                   0 kB
Pss:                   0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
300df000-300e0000 r--p 00046000 00:0b 135513132  /lib/libuClibc-0.9.29.so
Size:                  4 kB
Rss:                   4 kB
Pss:                   4 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         4 kB
Referenced:            4 kB
300e0000-300e1000 rwxp 00047000 00:0b 135513132  /lib/libuClibc-0.9.29.so
Size:                  4 kB
Rss:                   4 kB
Pss:                   4 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         4 kB
Referenced:            4 kB
300e1000-300e6000 rwxp 300e1000 00:00 0 
Size:                 20 kB
Rss:                  16 kB
Pss:                  16 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:        16 kB
Referenced:           16 kB
7f3fc000-7f400000 rwxp 7f3fc000 00:00 0 
Size:                 16 kB
Rss:                  16 kB
Pss:                  16 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:        16 kB
Referenced:           16 kB
7faf8000-7fb0d000 rwxp 7ffeb000 00:00 0          [stack]
Size:                 84 kB
Rss:                  12 kB
Pss:                  12 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:        12 kB
Referenced:           12 kB

From /var/log/messages:

user.warn kernel: dropbear invoked oom-killer: gfp_mask=0x1201d2, order=0, oomkilladj=0
user.warn kernel: Call Trace:
user.warn kernel: show_stack+0x50/0x184 (unreliable)
user.warn kernel: oom_kill_process+0x54/0x1ac
user.warn kernel: out_of_memory+0x1a8/0x1dc
user.warn kernel: __alloc_pages+0x24c/0x2dc
user.warn kernel: __do_page_cache_readahead+0xc4/0x220
user.warn kernel: filemap_fault+0x150/0x37c
user.warn kernel: __do_fault+0x6c/0x40c
user.warn kernel: do_page_fault+0x274/0x3ec
user.warn kernel: handle_page_fault+0xc/0x80
user.warn kernel: Mem-info:
user.warn kernel: DMA per-cpu:
user.warn kernel: CPU    0: hi:   42, btch:   7 usd:  31
user.warn kernel: Active:29872 inactive:194 dirty:0 writeback:0 unstable:0
user.warn kernel:  free:356 slab:1392 mapped:84 pagetables:65 bounce:0
user.warn kernel: DMA free:1424kB min:1440kB low:1800kB high:2160kB active:119488kB inactive:776kB present:130048kB pages_scanned:194475 all_unreclaimable? yes
user.warn kernel: lowmem_reserve[]: 0 0 0
user.warn kernel: DMA: 0*4kB 0*8kB 1*16kB 0*32kB 0*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1424kB
user.warn kernel: 204 total pagecache pages
user.warn kernel: Free swap:            0kB
user.warn kernel: 32768 pages of RAM
user.warn kernel: 0 pages of HIGHMEM
user.warn kernel: 1236 free pages
user.warn kernel: 770 reserved pages
user.warn kernel: 112 pages shared
user.warn kernel: 0 pages swap cached
user.err kernel: Out of memory: kill process 206 (MyProg) score 623 or a child
user.err kernel: Killed process 206 (MyProg)

What else can I try ? Thanks

ks1322
  • 33,961
  • 14
  • 109
  • 164
dargaud
  • 2,431
  • 2
  • 26
  • 39
  • 3
    Signal 9 is `SIGKILL` which means it was deliberately killed instead of crashed. What is the memory usage of the program when it's killed? Of the system in general? Things like this usually happens when a Linux system runs low on memory. – Some programmer dude Jan 16 '19 at 15:18
  • 6
    The `SIGKILL` is likely coming from something like the OOM killer. Does your system have a `/var/log/messages` or similar? `grep -i oom /var/log/messages` might be useful. In any case, you should be checking files like `/var/log/messages` for information around the time(s) your process gets killed. – Andrew Henle Jan 16 '19 at 15:29
  • Good call, it's a memory issue, I added the /var/log/messages info – dargaud Jan 16 '19 at 15:42
  • 3
    Since the log mentions `oom-killer`, your program's being stopped because it uses too much memory. You probably have a memory leak. – Jonathan Leffler Jan 16 '19 at 15:43
  • 2
    This appears relevant: https://lwn.net/Articles/104185/ – Andrew Henle Jan 16 '19 at 16:26
  • 1
    @Jonathan - Linux design leads to unexplained OOM kills as a normal course of business. It regularly overcommits memory. That is, it returns success on an allocation request when it should return failure. No leak required. I see it all the time on a GoDaddy VM with 1 GB of RAM acting as a web server. The OOM killer hits the MySQL process and corrupts our database on occasion. To avoid the broken allocator, then switch to Solaris. It does not oversubscribe memory. – jww Jan 16 '19 at 16:48
  • 1
    you can use `gdb` 'remotely' by running a `gdb` server along with the embedded program Then have your local `gdb` connect to it. The details are in the `gdb` manual – user3629249 Jan 16 '19 at 19:06
  • IMHO nothing in the system kills your process with a `SIGKILL`, but I can be wrong. It must be a program what kills your process. – Luis Colorado Jan 21 '19 at 09:00

1 Answers1

3

What else can I try ?

I recommend to try remote gdb debugging. And you'll better use Linux on the debugging host (that is, your development laptop).

(you could even cross-build your program with the DWARF debugging information sitting in another file; I know it is possible, but I forgot the details).

If your embedded system runs Linux, be sure to disable memory over-commitment.

See also this.

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547