4

Can you have an object shared across multiple WSGI threads/processes (and in a manner that would work on both *NIX and Windows)?

The basic premise: (1) I have a WSGI front end that will connect to a back end server. I have a serialization class, that contains rules on how to serialize/unserialize various objects, including classes specific to this project. As such, it needs to have some setup telling it how to handle custom objects. However, it is otherwise stateless - multiple threads can access it at the same time to serialize their data without issue. (2) There are sockets to connect to the back end. I'd rather have a socket pool than create/destroy every time there's a connection.

Note: I don't mind a solution where there are multiple instances of (1) and (2), but ideally, I'd like them created/initialized as few times as possible. I'm not sure about the internals, but if a thread loops rather than closing and having the server reopen on a new request, it would be fine to have one data set per thread (and hence, the socket and serializer are initialized once per thread, but reused each subsequent call it handles.) Actually having one socket per thread, if that's how it works, would be best, since I wouldn't need a socket pool and have to deal with mutexes.

Note: this is not sessions and has nothing to do with sessions. This shouldn't care who is making the call to the server. It's only about tweaking performance on systems that have slow thread creation, or a lot of memory but relatively slow CPUs.

Edit 2: The below code will give some info on how your system shares variables. You'll need to load the page a few times to get some diagnostics...

from datetime import *;
from threading import *;
import thread;
from cgi import escape;
from os import getpid;

count = 0;
responses = [];
lock = RLock();

start_ident = "%08X::%08X" % (thread.get_ident(), getpid());

show_env = False;

def application(environ, start_response):
    status = '200 OK';
    this_ident = "%08X::%08X" % (thread.get_ident(), getpid());


    lock.acquire();
    current_response = """<HR>
<B>Request Number</B>: """ + str(count) + """<BR>
<B>Request Time</B>: """ + str(datetime.now()) + """<BR>
<B>Current Thread</B>: """ + this_ident  + """<BR>
<B>Initializing Thread</B>: """ + start_ident  + """<BR>
Multithread/Multiprocess: """ + str(environ["wsgi.multithread"]) + "/" + str(environ["wsgi.multiprocess"]) +"""<BR>
"""
    global count;
    count += 1;

    global responses;
    responses.append(current_response)
    if(len(responses) >= 100):
        responses = responses[1:];
    lock.release();

    output="<HTML><BODY>";

    if(show_env):
        output+="<H2>Environment</H2><TABLE><TR><TD>Key</TD><TD>Value</TD></TR>";
        for k in environ.keys():
            output += "<TR><TD>"+escape(k)+"</TD><TD>"+escape(str(environ[k]))+"</TD></TR>";
        output+="</TABLE>";
    output += "<H2>Response History</H2>";
    for r in responses:
        output += r;
    output+="</BODY></HTML>"


    response_headers = [('Content-type', 'text/html'),
                        ('Content-Length', str(len(output)))]
    start_response(status, response_headers)
    return [output]

2 Answers2

2

For some reading on how mod_wsgi process/threading model works see:

Pay particular note to the section on building a portable application.

The comments there aren't really any different no matter what WSGI server you use and. All WSGI servers also fall into one of those multi process/single process, multi thread/single thread categories.

Graham Dumpleton
  • 57,726
  • 6
  • 119
  • 134
  • Thanks, I saw that, but wasn't sure on how to apply it. Basically: if(wsgi.multithread==True) - other executions in the same process will share the same variables that aren't function-scoped. If(wsgi.multiprocess==True) then they will share the same variables that aren't function-scoped. Let's say I'm in ThreadID 59926, complete a request. Will ThreadID 59926 potentially get some subsequent request, and have the same globals as 59926? – S James S Stapleton Apr 17 '13 at 13:05
  • If wsgi.multiprocess==True then global variables access by threads may not be the same as the threads could be running in completely different processes. Did you mean 'they will not share' when talking about multi process? – Graham Dumpleton Apr 17 '13 at 23:39
  • I didn't know when they will be shared. I figured if it was multiprocess, they'd be shared to save resources (not a hard effect to achieve). I made a nice little demo I'll add to my original post. – S James S Stapleton Apr 19 '13 at 11:26
2

By my reading of http://code.google.com/p/modwsgi/wiki/ProcessesAndThreading, if you have multiprocessing and multithreading on (as with worker, or if you have

WSGIDaemonProcess example processes=2 threads=25

then you have BOTH problems: multiple threads mean you could share a variable, but it would only be shared within each of 2 processes. There is no real way to share vars between processes unless you explicitly have another daemon (NON-APACHE) handling the message passing and requests.

Let's say you have a simple need for a database connection pool. With the above configuration, you'd have two pools, each serving 25 threads. This is fine for most people, as threads are lightweight and processes aren't (supposedly).

So, how to implement this?

  • In one of your modules, create a variable that instantiates a connection pool. Have each thread (really, the code that services an individual request) at an appropriate time get a connection use it, and return it to the pool. Don't forget the last part, you'll run out of connections quickly.

  • Create another daemon (not Apache related). instantiate an array of shared memory. Into this array put objects that consist of a db connection and a process Id (null to start). In a while-True loop, listen for connections on a socket, and when you get one, spawn a subprocess, passing in the shared array, the number of the array element. the subprocess fills in the process id it knows, it handles the request, closes any cursors, then removes its process id and exits.

  • Hire a programmer familiar with WSGI and db connection pooling to do it for you ;-) .

Kevin J. Rice
  • 3,337
  • 2
  • 24
  • 23