6

So I have a pretty substantial program which is freezing sporadically.
This program uses Qt, open scene graph, and google logging. This freeze happens mid printing in the google logging. The program itself is printing a ton of debugging information. I was able to connect to the program via gdb-server and this is the stack trace:

#0  0x000000397ac0e030 in __write_nocancel () at ../sysdeps/unix/syscall-template.S:82
#1  0x00007f5eecb74aeb in google::LogMessage::SendToLog() () from /lib64/libglog.so.0
#2  0x00007f5eecb71fc7 in google::LogMessage::Flush() () from /lib64/libglog.so.0
#3  0x00007f5eecb721a9 in google::LogMessage::~LogMessage() () from /lib64/libglog.so.0
#4  0x00000000004874a6 in LSDB::process (this=0x242d918, lsp=0x25f9200, circuit=0x24c7af0) at ../src/model/trill/LSDB.cpp:481
#5  0x00000000004a0f6f in Circuit::rx (this=0x24c7af0, eth=0x246fdf0) at ../src/model/trill/Circuit.cpp:355
#6  0x000000000045c950 in Circuit::qt_static_metacall (_o=0x24c7af0, _c=QMetaObject::InvokeMetaMethod, _id=0, _a=0x7fffaade95a0)
    at ../src/model/trill/Circuit.moc.cpp:55
#7  0x000000398798cb9f in QMetaObject::activate (sender=0x2470140, m=<optimized out>, local_signal_index=<optimized out>, argv=0x7fffaade95a0)
    at kernel/qobject.cpp:3547
#8  0x0000000000459610 in FramePublisher::subscription (this=0x2470140, _t1=0x246fdf0) at ../src/model/system/FramePublisher.moc.cpp:98
#9  0x000000000047c0d6 in FramePublisher::rx (this=0x2470140, frame=0x246fdf0) at ../src/model/system/FramePublisher.hpp:21
#10 0x000000000047bedb in EthernetPort::rx (this=0x246d7a0, frame=0x25a4180) at ../src/model/system/EthernetPort.cpp:81
#11 0x000000000045a208 in EthernetPort::qt_static_metacall (_o=0x246d7a0, _c=QMetaObject::InvokeMetaMethod, _id=1, _a=0x7fffaade9810)
    at ../src/model/system/EthernetPort.moc.cpp:51
#12 0x000000398798cb9f in QMetaObject::activate (sender=0x246d7a0, m=<optimized out>, local_signal_index=<optimized out>, argv=0x7fffaade9810)
    at kernel/qobject.cpp:3547
#13 0x0000000000459ddc in Port::rx (this=0x246d7a0, _t1=0x25a4180) at ../src/model/system/Port.moc.cpp:110
#14 0x00000000004803a6 in Port::inject (this=0x246d7a0, frame=0x25a4180) at ../src/model/system/Port.cpp:25

...

Notice the freeze itself is happening in __write_nocancel. There is just a single thread running...

(gdb) info threads Id Target Id Frame
* 1 Thread 21507 0x000000397ac0e030 in __write_nocancel () at ../sysdeps/unix/syscall-template.S:82

Any ideas on what is causing the freeze? Let me know what other information might help.

ymoreau
  • 3,402
  • 1
  • 22
  • 60
David Mokon Bond
  • 1,576
  • 2
  • 18
  • 42
  • 1
    Are you writing to anything special (a socket, a file on an nfs server, etc. ?) I'd suggest you run the program under valgrind to look for memory corruption, if possible – nos Jul 04 '12 at 20:41
  • So I'm using GLOG with GLOG_logtostderr=1 which means it will write the log messages to stderr. From the documentation it says it skips the write to file when you have that flag set. http://google-glog.googlecode.com/svn/trunk/doc/glog.html I'm not seeing any valgrind issues that should be causing this. – David Mokon Bond Jul 05 '12 at 22:14
  • Did you ever find out what is causing this? – Ton van den Heuvel Jan 04 '13 at 07:43
  • No I have not. I would love it if someone did know what was wrong though... – David Mokon Bond Jan 04 '13 at 14:36
  • On further investigation it seems to be somehow related to printing too fast... I also noticed when I kill -9 the process the terminal isn't restored. – David Mokon Bond Jan 06 '13 at 04:57
  • FYI, I had an issue with `__write_nocancel` taking up a large amount of time in my programs communicating over TCP sockets as well. As it turned out, this was due to [Nagle's algorithm](http://en.wikipedia.org/wiki/Nagle%27s_algorithm). Setting `TCP_NODELAY` on the TCP sockets solved the issue. Also see: http://stackoverflow.com/questions/14157143/how-can-the-following-qtcpsocket-delay-be-explained. – Ton van den Heuvel Jan 06 '13 at 20:25
  • Yea that shouldnt be mine. Mine are simply log messages being printed and I don't have a remote sys logger configured at all. thanks though. – David Mokon Bond Jan 06 '13 at 23:03

1 Answers1

3

You said that the output is being written to stderr. If stderr is not being constantly read by a separate process, the I/O buffer may be full so it is waiting for stderr to be read before it writes more.

If process A is writing to stderr and process B normally reads from stderr but is currently waiting on process A, this could cause deadlock. In your case, it could only cause deadlock if the write to stderr is too large since a smaller write would succeed immediately and free process A, thus freeing process B to read from stderr.

This is some speculation on my part but in general I would assume that your issue is that you're waiting to write to a full buffer.

Will Brode
  • 1,026
  • 2
  • 10
  • 27
  • Well in this case process B is the terminal so it shouldn't be waiting on my process for anything that I can imagine. If the write buffer filled up that process should yield and the terminal will have some time to process what it needs. Thanks for the thoughts though. – David Mokon Bond Jun 13 '13 at 02:31