0

EDIT: Probably this question is wrong. I was getting the thread id with threading.current_thread().ident and I got the same value for subsequent calls, which led me to think threads were being reused, but that does not seem to be the real thread id. With threading.current_thread().name I get different names for each request, so probably threads are not being reused. Besides the documentation I am attaching in the question refers to xmlrpc-c library, not python's. The thing is that I need to comunicate with a xmlrpc-c server from another xmlrpc-python server. My tests suggest that xmlrpc-c does reuse threads when requests come from the same connection (as I can take advantage of thread caches), but I am not totally sure.


Sorry if the title is a bit strange, but I didn't know how to make it clearer.

The thing is, when using a XMLRPC server with ThreadingMixIn it spawns a new thread to serve each request, but it leaves that connection open for some time, so it is always the same thread which serves all the requests that uses that connection. This has a limit, the default is 30 requests using the same connection. This can be read in the "Parameters" section of the documentation of ServerAbyss:

http://xmlrpc-c.sourceforge.net/doc/libxmlrpc_server_abyss.html#server_abyss_run_server

The way to reuse a connection is to send the request to the XMLRPC server using the same ServerProxy object, and not creating a new one inmediately before the call, that I think it is the usual way of making a RPC call.

Well, this behaviour is very useful for me, as it permits me to take advantage of some thread caches. But the real improvement would be to have this kind of behaviour using ForkingMixIn instead of ThreadingMixIn. The problem is that it seems that ForkingMixIn always spawns a new process to handle the request, without taking in consideration if the request comes from a connection already opened or not.

So, is there a way in which ForkingMixIn could "reuse" a process in the same way that ThreadingMixIn reuses a thread to handle a request arriving from an already opened connection?

Marcos Fernandez
  • 556
  • 9
  • 22
  • 1
    Are you talking about a [`socketserver.ThreadingMixIn`](https://docs.python.org/3/library/socketserver.html#asynchronous-mixins), or a similar class provided by a third-party library? Because I don't think the stdlib class reuses threads like that. If you look at [the source](http://hg.python.org/cpython/file/default/Lib/socketserver.py#l612), each `process_request` spawns a new thread. – abarnert Sep 10 '14 at 09:04
  • 1
    Duplicate of http://stackoverflow.com/questions/11699471/is-there-any-pool-for-threadingmixin-and-forkingmixin-for-socketserver ? – Aaron Digulla Sep 10 '14 at 09:07
  • Anyway, if you are talking about the `socketserver` classes, you can subclass `ForkingMixin` and override its [`process_request`](http://hg.python.org/cpython/file/default/Lib/socketserver.py#l587) to look up something appropriate about the `request` (e.g., if your XMLRPC library has the socket fd on the `request` object, you can use that) in a cache. The only trick is that you'll need some way to actually send the request information over to the child process, which presumably means giving a `multiprocessing.Queue` or `Pipe` to each one. – abarnert Sep 10 '14 at 09:07
  • 1
    @AaronDigulla: Nice find. Even if it's not a dup (I think it probably is, but I'm not sure about what the OP is asking…), the `concurrent.futures` answer there is exactly how I'd do this (except that I can't remember the last time I used `socketserver`, or anything built on top of it…), and very well explained. – abarnert Sep 10 '14 at 09:13
  • @abarnert: I think you are right, I was getting the thread id with threading.current_thread().ident and I got the same value for subsequent calls, but that does not seem to be the real thread id. With threading.current_thread().name I get different names for each request, so probably threads are not being reused. Besides the documentation I attached in the question refers to xmlrpc-c library, not python's. The thing is that I need to comunicate with a xmlrpc-c server from another xmlrpc-python server. My tests suggest that xmlrpc-c does reuse threads, but I am not totally sure. – Marcos Fernandez Sep 10 '14 at 10:17
  • @AaronDigulla: I had seen that question before, but this is not a duplicate (maybe wrong/nosense, but not duplicate) as I don't really want a pool of threads/processes serving the requests. I would need that the requests arriving from the same connection would be served by the same process, which does not seem to be possible. – Marcos Fernandez Sep 10 '14 at 10:21
  • @MarcosFernandez: At least on POSIX, both thread IDs and process IDs can be reused as long as no two simultaneous processes on the system or simultaneous threads in the same process ever have the same ID. Many platforms deliberately avoid reusing process IDs to make scripting easier; many of them don't do the same for thread IDs, just using the lowest one available, so you'll see collisions a lot more often with threads… but either way, you can't assume either that it will always be a new ID or that it will always be the last one. – abarnert Sep 10 '14 at 17:52
  • Anyway, as I said in my second comment, picking a process out of a cache isn't hard as long as you have some reliable key that means "same session"; sending the request data to the process over a `Queue` or `Pipe` is a bit more work, but it's certainly doable. – abarnert Sep 10 '14 at 17:54

2 Answers2

2

socketserver.ForkingMixIn cannot do this.

But then neither can socketserver.ThreadingMixIn, so your premise is wrong. So the first question is… you thought you have connection pooling on threads, you didn't have it, and you didn't notice a problem, so are you sure you actually need it?


If you do, you will have to write your own mixin. Fortunately, you can subclass ForkingMixIn to do the heavy lifting for launching and managing processes, and just add your own code on top of it, which should be able to sit comfortably within the only usefully-documented method, process_request.

Also, not that, like many libraries in the stdlib, socketserver is meant to be readable and useful sample code as well as being a useful library, which is why the docs link to the source right at the top, which makes things a lot easier.


Unfortunately, while plugging your code into ForkingMixIn will be easy, designing and writing that code will not.

The basic concept is simple: you have a cache of child processes, keyed off the connection. For each request, you look up the connection in the cache, create a new child if needed, then pass the request to the appropriate child.

Unfortunately, every one of those steps is easier said than done.

Well, not the first. That's just a WeakValueDictionary. But beyond that:

  • What is a "connection"? XMLRPC is a pure stateless request-response protocol. If you assume every address is a connection you get false positives (running two clients at once, or just two users inside the same NAT); if you assume every socket is a connection you get false negatives (clients can, and will, close the socket and open a new one for any reason they want). The usual answer is to use an application-level session stored in an HTTP cookie (or an HTTP header or an XML element), so now you have to write or use an HTTP session manager (including handling reap-on-timeout).
  • How do you "pass the request" to the child? ForkingMixIn relies on the fact that the request is a local variable pre-fork and therefore still available as a local variable post-fork. That's not true for some object you pull out of a cache. So, you need to use a multiprocessing.Queue or Pipe, or some other IPC mechanism. And, while Queue makes things trivial for anything that can be pickled, a request object almost certainly can't be pickled, because it includes things like the socket files. So, you need to write code that bundles everything up into something that can be passed—which includes either doing fd migration (which Python doesn't handle for you) or reading all the request data to pass that instead (so the other side can then wrap it in, e.g., a BytesIO to stick in self.rfile).
  • How does the child respond? Again, ForkingMixIn relies on the fact that it's inherited the wfile file from its parent, but again, that obviously isn't going to work here. If you haven't done fd migration, then you need to pass the response back over IPC (e.g., another Queue). In which case your parent has to block on that IPC. So, you're going to need to use a ThreadingMixIn to create a thread per request just so they can wait on your child processes (or some alternative mechanism, like a child-service thread that loops over all the IPCs as composable futures or as select-able pipes or whatever).

None of this is particularly hard conceptually (except the session-timeout issue), but it's all a whole lot of code.

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • Very useful. Thank you. Finally the simplest way to solve my problem was to create a shared (between threads) list of connection objects and use them in a round robin way, checking first if they are being used. – Marcos Fernandez Sep 25 '14 at 11:56
1

I think the problem here is to pass the data to the process. You can give the process the connection (so it can handle everything by itself) but then you'll never know when the connection is closed or not needed anymore. Also the child process can't easily create a new connection after N requests.

Or you can keep the connection in the parent process and just pipe the data to the child (for example via stdout -> stdin pipe). But that means you have to copy each byte many times (take it out of the receiving buffer of the parent process, but it into the sending buffer of the parent's pipe, probably copy it a couple of times while the pipe does its magic, copy it into the child's input buffer, ...) and then everything again for each byte of result. Not an issue for small requests that take some time to process but probably something I'd keep an eye on.

Aaron Digulla
  • 321,842
  • 108
  • 597
  • 820