2

My program has this weird problem:

It tries to find a device by reading the output of some commands through a pipe:

FILE* fp = NULL;
fp = popen ("cd /sys/bus/usb/devices; grep -i NDI */product", "r");

and then uses fgets() to read the file stream and uses pclose() to close the pipe.

In a single thread program it worked well. However, after I integrated it into a multi-thread program, the fgets() function began to block the thread randomly.

After checking, I found that fgets() blocks because sometimes fp is returned as an empty file stream. After I set the status of fp to non-blocking and use read() to read the file by its ID, I can see read() returns -1 because of the empty file stream and then pclose() hangs. All these happen randomly.

So I think in this case the command executed through popen() hangs and never terminates. But WHY it happens randomly? The multi-thread program only has another thread for user interface interaction. I would assume it is fine since the pipe is only used locally.

Any ideas are appreciated. Thanks!


UPDATE

strace shows that sometimes the child process calls futex() after a bunch of munmap() calls and then hangs there: futex(0xb72eaf00, FUTEX_WAIT_PRIVATE, 2, NULL

In a normal case futex would not be called.

For reference, the output of strace in main process:

1344962944.530384 pipe2([26, 27], O_CLOEXEC) = 0
1344962944.530441 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb41a8a68) = 25529
1344962944.534683 close(27)             = 0
1344962944.534801 fcntl64(26, F_SETFD, 0) = 0
1344962944.534852 write(1, "entering fgets\n", 15) = 15
1344962944.534924 fstat64(26, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
1344962944.534992 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb377d000
1344962944.535065 read(26, 

The output of strace for child process shell:

1344962944.534652 set_robust_list(0xb41a8a70, 0xc) = 0
1344962944.534790 getpid()              = 25529
1344962944.534843 getppid()             = 25453
1344962944.534917 close(12)             = 0
1344962944.534983 munmap(0xa8fff000, 1048576) = 0
...
1344962944.535925 munmap(0xb2728000, 118784) = 0
1344962944.535952 close(11)             = 0
1344962944.535980 close(10)             = 0
1344962944.536018 futex(0xb72eaf00, FUTEX_WAIT_PRIVATE, 2, NULL
Jim
  • 21
  • 5
  • 2
    you could use strace to see what your program and the childs it spawns are doing, syscall wise. – PlasmaHH Aug 14 '12 at 14:24
  • How about `find -L -maxdepth 2 -type f -name product | xargs grep -i NDI`? `find` will follow the symlinks in `/sys/bys/usb/devices` to a maxdepth of 2 (1 for the symlink, and 1 for each file in the symlink'd directory) and look for regular files named 'product'. Each found 'product' file is piped into `xargs grep -i NDI`. – rkyser Aug 14 '12 at 16:45
  • Is there a device being removed while this is being run? – rkyser Aug 14 '12 at 16:52
  • strace shows that sometimes the child process calls futex() after a bunch of munmap() calls and then hangs there: futex(0xb72eaf00, FUTEX_WAIT_PRIVATE, 2, NULL In a normal case futex is not called. Would that be a memory leak problem? @PlasmaHH – Jim Aug 14 '12 at 17:12
  • The problem comes out with or without the device being connected @rkyser – Jim Aug 14 '12 at 17:15
  • Does the same command hang if you run if from an interactive shell instead of from your program? – Adam Rosenfield Aug 14 '12 at 17:21
  • Which child is blocked on that `futex`, the shell or the grep? `futex` calls often are an indication of intraprocess communication in a multithreaded process. So while one thread of the child blocks waiting for that futex, is there some other thread in the child (created by a `clone` syscall not followed by an `execve` call) which *could* eventually set the futex, but is currently blocked for some other reason? Neither shell nor grep usually are multithreaded, so I'm a bit surprised. – MvG Aug 14 '12 at 17:56
  • The command does not hang if I run it in an interactive shell. Actually the same function always worked well in a simpler, single-thread program. @Adam Rosenfield – Jim Aug 14 '12 at 20:12
  • The child shell is blocked on futex. There is no other threads in the child. Please refer to the reference I just attached. Thanks. If the shell child is not blocked, it will spawn the grep child then.@MvG – Jim Aug 14 '12 at 20:29
  • You mention a read that returns -1 - the obvious next question is: what is errno at that point? – themel Aug 15 '12 at 09:45
  • Also: http://libusb.sourceforge.net/api-1.0/group__dev.html – themel Aug 15 '12 at 09:46
  • A couple of years later... and I am getting the same behavior but on macOS. My SO question is: https://stackoverflow.com/questions/40408586/call-to-fgets-on-stream-generated-by-popen-hangs If anyone here had any more developments on this subject, I appreciate any input. – Santiago Alessandri Nov 03 '16 at 21:56

0 Answers0