0

I have this loop:

listGames = []
for home in range(totalPlayers - 1):
    for away in range(home + 1, totalPlayers):
        listGames.append((home, away))

print listGames
match_num = 1
for game in listGames:
    player1 = listPlayers[game[0]]
    player2 = listPlayers[game[1]]
    do_stuff(player1, player2)

When there are a lot of players, this loop can take quite some time, so is want to use threads to complete the loop faster. However, player1 and player2 are instances of classes, so doing stuff with them simultaneously would be bad. EDIT: The order in which these 'tasks' are executed does not matter otherwise.

I found http://www.troyfawkes.com/learn-python-multithreading-queues-basics/ which seems to be exactly what I want, but I am unsure how to adapt it to make sure that only 1 instance of the class/player is being run at once

(Simple) example:

totalPlayers = 4

0, 1, 2, 3
listGames = [(0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3)]

so, game (0, 1), (2, 3) can be executed simultaneously, but the others will have to wait until these are done

Hints/Ideas?

LordAro
  • 1,269
  • 3
  • 18
  • 35
  • How long is "a lot of time" and can you optimize `do_stuff`? Unless there's a lot of waiting in `do_stuff` threads may actually slow you down in python. Multiprocessing _can_ be an option, but it really, really depends on where your bottlenecks actually are. Also, look at `multiprocessing.Pool` and the `multiprocessing.dummy` module. – g.d.d.c Jun 11 '14 at 01:10
  • There is indeed a lot of waiting - on the order of minutes - do_stuff() is rather greatly simplified :) – LordAro Jun 11 '14 at 01:11

2 Answers2

2

Unless do_stuff() is IO-bound, this will probably make your code slower because of the global interpreter lock. Based on your statement that "this can take time when there are many players," I'm inclined to think that your program is probably CPU bound --- in this case, multithreading will probably harm your performance.

Speaking to your original question, you're asking for an exact cover of your set of players from among two element subsets -- which, unfortunately, is NP-complete.

Patrick Collins
  • 10,306
  • 5
  • 30
  • 69
  • That's what `multiprocessing` is for! – dano Jun 11 '14 at 01:46
  • @dano keep in mind that `multiprocessing` is a heavy, heavy duty tool with serious overhead for startup as well as communication costs. It takes a lot of work to get good performance out of something as lightweight as pthreads -- I would be really surprised if LordAro gets an improvement out of it. Unless he's doing some really computationally intensive stuff, I'd bet that there's probably much more performance to be gained by working on his existing code than adding new complexity. Even if he is -- then it's time to look into `numpy`, Cython, etc. – Patrick Collins Jun 11 '14 at 12:09
  • I mean also note that his parallelism isn't even great to begin with, and just finding the covering sets he needs for optimal parallelism is probably asymptotically more expensive than any of the code he's written so far. – Patrick Collins Jun 11 '14 at 12:11
  • the OP states that `do_stuff` takes on the order of minutes to complete. The extra overhead of starting up a `multiprocessing.Pool` (less than a second) and of passing his class instances between the processes (also less than a second each, unless he's got enormous data structures in each instance) is inconsequential compared to the amount of time `do_stuff` takes. I do agree that Cython/numpy will likely speed up his code, too, but without seeing what's actually doing on in `do_stuff` it's hard to say for certain what what best option is. – dano Jun 11 '14 at 14:17
1

Here's a sample program that shows a way you can do this. The idea is to create a multiprocessing.Pool to run many instances of do_stuff simultaneously. We also maintain a set that keeps track of all the players currently being processed by a do_stuff instance, so that we never process the same player more than once simultaneously. As do_stuff finishes its work, it tells the parent process that it's done with the players, so that new tasks using those players can be processed.

import time
import multiprocessing
from Queue import Empty

listGames = [(0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3)] 

def do_stuff(players):
    player1, player2 = players
    print("player1 {} player2 {}".format(player1, player2))
    time.sleep(5)
    # Imagine some other stuff happens here.
    print ("now done with {} and {}".format(player1, player2))
    q.put((player1, player2))

if __name__ == "__main__":
    q = multiprocessing.Queue()
    pool = multiprocessing.Pool()
    gamesSet = set(listGames)  # Convert to a set for efficiency reasons.
    running = set()  # This keeps track of players being processed.
    while gamesSet:
        to_remove = []
        for player in gamesSet:  
            if player[0] not in running and player[1] not in running:
                running.add(player[0])
                running.add(player[1])
                pool.apply_async(do_stuff, (player,))
                to_remove.append(player)
        for player in to_remove:
            gamesSet.remove(player)
        while True:
           # Find out if we're done processing any players.
           try:
               done = q.get_nowait()
               running.remove(done[0])
               running.remove(done[1])
           except Empty:
               break
    pool.close()
    pool.join()

Output:

dan@dantop2:~$ ./mult.py 
player1 0 player2 1
player1 2 player2 3
now done with 0 and 1
now done with 2 and 3
player1 1 player2 2
player1 0 player2 3
now done with 0 and 3
now done with 1 and 2
player1 1 player2 3
player1 0 player2 2
now done with 1 and 3
now done with 0 and 2
dano
  • 91,354
  • 19
  • 222
  • 219
  • This looks perfect, and indeed works with test code (with indentation fixes and moving `to_remove` in scope), but when i integrate it into my actual code, it breaks. As far as I can tell, `running` gets appended to successfully, but do_stuff never apparently happens, nor is anything appended to `to_remove`. Any ideas? – LordAro Jun 11 '14 at 10:58
  • After some investigating it turns out that giving do_stuff multiple arguments appears to stop it from being called, but even if i remove the multiple arguments, it still never finishes (q.get_notwait() never happens) – LordAro Jun 11 '14 at 11:17
  • @LordAro, Apologies for the issues with the sample code, I think I've fixed them all in my most recent edit. The issue with passing multiple arguments is probably caused by an exception occurring in trying to call `do_stuff`. You should be able to tell by doing `result = pool.apply_async(do_stuff, (player,)); result.get()`. The `result.get()` will wait for the `apply_async` call to finish, which should cause the exception to be raised. If you don't call `get`, the exception never shows up. If you're still having problems after doing that, edit your code into your question and I'll help fix it. – dano Jun 11 '14 at 14:22