2

I have a Python daemon running in production. It employs between 7 and 120 threads. Recently the smallest instance (7 threads) started to show hangs while all other instances never showed this kind of problem. Attaching strace to the python process shows that all threads are calling futex FUTEX_WAIT_PRIVATE, so they are probably trying to lock something.

How would you debug such a problem?

Note that this is a production system running from flash memory, so disk writes are constrained, too.

hlovdal
  • 26,565
  • 10
  • 94
  • 165
Helmut Grohne
  • 6,578
  • 2
  • 31
  • 67
  • Today I observed the same instance hanging in all but the console-thread. It was doing select(1, [0], ...). When I typed in a line, it also deadlocked in the same futex call as all the other threads (except one, which was using a different address). – Helmut Grohne Oct 12 '10 at 08:12
  • 1
    In the meantime, the global import lock was finally replaced in CPython 3.3, see http://hg.python.org/cpython/rev/edb9ce3a6c2e. That's great news for me... I already stumbled over that ~10 years ago - and today again - and each time, it costed me more than a day to understand what's going on... –  Apr 26 '13 at 15:21

2 Answers2

5

The observation was slightly incorrect. One thread wasn't calling futex, but instead swapping while holding the gil. Since the machine in question is low hardware this swapping took very long and seemed to be a deadlock. The underlying problem is a memory leak. :-(

prajmus
  • 3,171
  • 3
  • 31
  • 41
Helmut Grohne
  • 6,578
  • 2
  • 31
  • 67
2

Dear Helmut, I've the same problem with one thread hanging on FUTEXT_WAIT_PRIVATE.

It seems you have solved the issue. Can you share more information about the solution?

UPD:

The reason for the lock was finally found (at least for my case): it was due to import lock in Python.

Consider following situation:

file1.py:

import file2

file2.py:

create thread "thread2"

run "thread2"

wait until "thread2" finish with some function (let's say go Go())

def Go():

import some_module

....

Here the import in Go() would hang up since the import is locked in the main thread (by import file2) which will not be released until Go() finishes. The user will see in strace hang on FUTEX_WAIT_PRIVATE.

To work around this place the code executed during the import of file2 into Do() function and run it after importing file2:

import file2

file2.Do()

PoltoS
  • 1,232
  • 1
  • 12
  • 32
  • 1
    Note that importing modules from a thread is discouraged in the Python documentation. – Helmut Grohne Jun 14 '11 at 10:51
  • You are right. A good idea is to import all needed modules in first lines of code, but this is unfortunately not possible in big projects. In mine I prefer to keep these imports in right places in the code to keep readability and obey a simple rule: import first and only then run new threads. Works well for my huge project. – PoltoS Jun 20 '11 at 00:06