7

Suppose that I've written a wsgi application. I run this application on Apache2 on Linux with multi-threaded mod-wsgi configuration, so that my application is run in many threads per single process:

WSGIDaemonProcess mysite processes=3 threads=2 display-name=mod_wsgi
WSGIProcessGroup mysite
WSGIScriptAlias / /some/path/wsgi.py

The application code is:

def application(environ, start_response):
    from foo import racer
    status = '200 OK'
    response_headers = [('Content-type', 'text/plain')]
    start_response(status, response_headers)
    return [racer()] #call to racer creates a race condition?

module foo.py:

a = 1
def racer():
    global a
    a = a + 1
    return str(a)

Did I just create a race condition with variable a? I guess, a is a module-level variable, that exists in foo.py and is the same (shared) among threads?

More theoretical questions derived from this:

  1. Concurrent threads within the same process access and modify the same a variable so my example is not thread-safe?
  2. If my web-server is Apache, each thread of my application on Linux is created on C-level with pthreads API and the function, which the pthread must execute is some kind of python interpreter's main function? Or does Apache protect me somehow from this error?
  3. What if I were running this on a python-written web-server like Tornado's HTTPServer? Web server, written in python, implements threads as python-level threading.Thread objects, and runs application function in each thread. So, I suppose it's a race condition? (I also suppose, in this case I can abstract from underlying C-level pthreads below threading.Thread implementation and worry only about python functions, because the interpreter won't allow me to modify C-level shared data and screw its functioning. So the only way to break thread-safety for me is to deal with global variables? Is that right?)
Boris Burkov
  • 13,420
  • 17
  • 74
  • 109
  • There's a lot of moving parts here. Do you think you could narrow your question to the actual, specific concern that you have? – Robert Harvey May 15 '14 at 18:07
  • Robert, the practical side is: 1) I'm preparing to write my custom server-side application on `Tornado` or `Flask` or something and trying to understand, if I can create race conditions by inaccurate actions with global data. Can I introduce modifications in imported modules? 2) I'm trying to understand, why I don't encounter race conditions in my application now with `apache` although I didn't concern myself with thread-safety at all yet. – Boris Burkov May 15 '14 at 18:15
  • Can your question be answered in a practical way, without writing a book chapter? – Robert Harvey May 15 '14 at 18:16
  • @RobertHarvey ok, sorry if I'm annoying. Here's an example in EDIT. Did I just create a race condition? – Boris Burkov May 15 '14 at 18:21

2 Answers2

4

Yes, you have a race condition there, but it's not related to the imports. The global state in foo.a is subject to a data race between a + 1 and a = ...; since two threads can see the same value for a, and thus compute the same successor.

The import machinery itself does protect against duplicate imports by multiple threads, by means of a process wide lock (see imp.lock_held()). Although this could, in theory, lead to a deadlock, this almost never happens, because few python modules lock other resources at import time.

This also suggests that it's probably safe to modify sys.path at will; since this usually happens only at import time (for the purpose of additional imports), and so that thread is already holds the import lock, other threads cannot cause imports that would also modify that state.

Fixing the race in racer() is quite easy, though:

import threading
a = 1
a_lock = threading.Lock()

def racer():
    global a
    with a_lock:
        my_a = a = a + 1
    return str(my_a)

which will be needed for any global, mutable state in your control.

SingleNegationElimination
  • 151,563
  • 33
  • 264
  • 304
  • Thanks for your answer! The phrase about import was a mistake - before current version, I had another example with import, but thanks for your note about lock. I've got another thought: theoretically, web-server implementation of thread spawning could create a new instance of python interpreter's data in dynamic memory + its per-thread stack, so some web-servers could just save us from those races at the cost of memory overhead? – Boris Burkov May 15 '14 at 21:21
  • That's a pretty big question, I don't think I can answer it in the space of a comment. Instead of trying to come up with ways to avoid data races on global state, you can resolve the issue completely by having no (mutable) global state in the first place. – SingleNegationElimination May 16 '14 at 02:02
2

Read the mod_wsgi documentation about the various processes/thread configurations and in particular what it says about data sharing.

In particular it says:

Where global data in a module local to a child process is still used, for example as a cache, access to and modification of the global data must be protected by local thread locking mechanisms.

Graham Dumpleton
  • 57,726
  • 6
  • 119
  • 134