2

I'm trying to create a thread pool in python and thought all was well and good until I went to run some tests.

In my tests I am recording the time it takes for n tasks to be completed with x threads. I then plotted this data to check whether or not the output resembled what I thought to be correct. Unfortunately I'm getting unexpected results. For whatever reason my delta t for certain tests doesn't conform to the expected line other points lay on. I believe this is a thread synchronization issue having to do with threading.Event() in my code. I'm not very familiar with Python so maybe I am overlooking something.

What's causing my thread pool to give unexpected results in my tests? Any help is appreciated thanks! enter image description here

Thread Count:

[2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40]

Delta t:

[15.005759, 0.002, 5.003255, 1.995844, 0.99826, 0.006006, 0.997074, 0.994626, 0.002004, 0.988823, 0.005081, 0.993242, 0.990138, 0.002986, 0.995473, 0.000999, 0.986356, 0.002002, 0.975053, 0.021287]

Below is my code:

from threading import Thread
from threading import Event
from queue import Queue
from time import sleep
import matplotlib.pyplot as plt
import datetime
import requests
import random
import json
import copy
import sys


class Tester:

def __init__(self):
    pass

def run(self):
    numThreads = 2
    x = []  # thread count
    y = []  # delta time

    for t in range(20): # Run t tests
        threadPool = ThreadPool(numThreads)
        startTime = datetime.datetime.now()
        x.append(numThreads)

        print("Starting test %d" % (t + 1))

        # Add n tasks
        for n in range(30):
            threadPool.addTask(PoolTask(n))

        threadPool.start() # wait until all tasks are added before starting

        # Wait for all tasks in queue to complete
        threadPool.wait() 

        timeDelta = datetime.datetime.now() - startTime
        print("Test %d complete (t=%f s, n=%d)" % ((t + 1), timeDelta.total_seconds(), numThreads))
        y.append(timeDelta.total_seconds())

        numThreads += 2

    # After the tests plot the resulting data
    print(x)
    print(y)
    plt.plot(x, y, 'ro')
    plt.xlabel('thread count')
    plt.ylabel('delta t (time)')
    plt.show()

class ThreadPool:
__poolEvent = Event()
__runEvent = Event()

def __init__(self, size = 1, start=False):
    self.__size = size
    self.__queue = Queue()
    self.__pool = []
    self.__destroyed = False

    if start:
        __runEvent.set()

    # Create the thread pool
    for i in range(self.__size):
        thread = Thread(target = self.__worker, args = [i])
        self.__pool.append(thread)
        thread.daemon = True
        thread.start()
        pass

def __worker(self, workerNumber):
# Worker will run until thread pool is terminated
    while True:
        if(self.__destroyed):
            break

        self.__runEvent.wait() # Wait until threadpool is started

        task = self.__queue.get() # Blocking

        try:
            task.execute()
        except (AttributeError, TypeError):
            raise Exception('Task does not have execute() defined.')

        self.__queue.task_done()

        if self.__queue.empty():
            self.__poolEvent.set() # Allow caller to proceed if waiting

def start(self):
    self.__runEvent.set()

def addTask(self, task):
    if(self.__destroyed):
        raise Exception('Unable to add task the pool has already been destroyed.')
    self.__poolEvent.clear()    # Have caller wait if listening
    self.__queue.put(task)

def destroy(self):
    if(self.__destroyed):
        raise Exception('Cannot destory as the thread pool has already been destroyed.')

    # Flag causes threads to stop pulling from queue and return
    self.__destroyed = True

def wait(self):
    self.__poolEvent.wait()

class PoolTask:
""" example task that implements execute() """

def __init__(self, taskNumber):
    self.taskNumber = taskNumber

def execute(self):
    #print('Task %d executing...' % self.taskNumber)
    #sleep(random.randint(1, 5))
    sleep(1)
    #print('Task %d done executing' % self.taskNumber)

Execution Output:

>>> import Tester
>>> kp = KrogoPoints.KrogoPoints()
>>> kp.run()
Starting test 1
Test 1 complete (t=15.005759 s, n=2)
Starting test 2
Test 2 complete (t=0.002000 s, n=4)
Starting test 3
Test 3 complete (t=5.003255 s, n=6)
Starting test 4
Test 4 complete (t=1.995844 s, n=8)
Starting test 5
Test 5 complete (t=0.998260 s, n=10)
Starting test 6
Test 6 complete (t=0.006006 s, n=12)
Starting test 7
Test 7 complete (t=0.997074 s, n=14)
Starting test 8
Test 8 complete (t=0.994626 s, n=16)
Starting test 9
Test 9 complete (t=0.002004 s, n=18)
Starting test 10
Test 10 complete (t=0.988823 s, n=20)
Starting test 11
Test 11 complete (t=0.005081 s, n=22)
Starting test 12
Test 12 complete (t=0.993242 s, n=24)
Starting test 13
Test 13 complete (t=0.990138 s, n=26)
Starting test 14
Test 14 complete (t=0.002986 s, n=28)
Starting test 15
Test 15 complete (t=0.995473 s, n=30)
Starting test 16
Test 16 complete (t=0.000999 s, n=32)
Starting test 17
Test 17 complete (t=0.986356 s, n=34)
Starting test 18
Test 18 complete (t=0.002002 s, n=36)
Starting test 19
Test 19 complete (t=0.975053 s, n=38)
Starting test 20
Test 20 complete (t=0.021287 s, n=40)
LW001
  • 2,452
  • 6
  • 27
  • 36
masterwok
  • 4,868
  • 4
  • 34
  • 41
  • Are you aware of `multiprocessing.pool.ThreadPool`? That provides a `ThreadPool` implementation with an API that matches the `multiprocessing.Pool` API. – dano Jul 13 '14 at 15:58
  • Thanks I'll check it out :] I figured something like this existed but haven't really written much in Python. – masterwok Jul 13 '14 at 16:04
  • No problem. That particular class is completely undocumented, so not a lot of people are aware of it. – dano Jul 13 '14 at 16:06

1 Answers1

3

I think your problem is that you defined __poolEvent and __runEvent as class variables, when you really want them to be instance variables. That's causing the state of __poolEvent and __runEvent to be shared between your ThreadPool instances:

>>> class A(object):
...   event = threading.Event()  # class variable, like yours
...   def getEvent(self): return self.event
... 
>>> a = A()
>>> a.getEvent()
<threading._Event object at 0x7f4c51d3db90>
>>> a.getEvent().is_set()
False
>>> b = A()
>>> b.getEvent().set()
>>> a.getEvent().is_set()
True   # Uh oh, changing b's event also changed a's
>>> b.getEvent()
<threading._Event object at 0x7f4c51d3db90> # Because it's the same event!

It looks like because of this unexpected sharing, your events are sometimes set when you're not expecting them to be.

Your results will look much saner if you change the top of the class to look like this:

class ThreadPool(object):

    def __init__(self, size = 1, start=False):
        self.__poolEvent = Event()
        self.__runEvent = Event()
        self.__size = size
        self.__queue = Queue()
        self.__pool = []
        self.__destroyed = False

        if start:
            self.__runEvent.set()

The output then looks much better:

Starting test 1
Test 1 complete (t=15.019228 s, n=2)
Starting test 2
Test 2 complete (t=7.009718 s, n=4)
Starting test 3
Test 3 complete (t=5.007426 s, n=6)
Starting test 4
Test 4 complete (t=3.005955 s, n=8)
Starting test 5
Test 5 complete (t=3.006504 s, n=10)
Starting test 6
Test 6 complete (t=2.004594 s, n=12)
Starting test 7
Test 7 complete (t=2.004225 s, n=14)
Starting test 8
Test 8 complete (t=1.004068 s, n=16)
Starting test 9
Test 9 complete (t=1.004277 s, n=18)
Starting test 10
Test 10 complete (t=1.004509 s, n=20)
Starting test 11
Test 11 complete (t=1.004266 s, n=22)
Starting test 12
Test 12 complete (t=1.003043 s, n=24)
Starting test 13
Test 13 complete (t=1.004713 s, n=26)
Starting test 14
Test 14 complete (t=1.003422 s, n=28)
Starting test 15
Test 15 complete (t=1.003525 s, n=30)
Starting test 16
Test 16 complete (t=1.003448 s, n=32)
Starting test 17
Test 17 complete (t=1.002924 s, n=34)
Starting test 18
Test 18 complete (t=1.003600 s, n=36)
Starting test 19
Test 19 complete (t=1.003569 s, n=38)
Starting test 20
Test 20 complete (t=1.003708 s, n=40)
[2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40]
[15.019228, 7.009718, 5.007426, 3.005955, 3.006504, 2.004594, 2.004225, 1.004068, 1.004277, 1.004509, 1.004266, 1.003043, 1.004713, 1.003422, 1.003525, 1.003448, 1.002924, 1.0036, 1.003569, 1.003708]
dano
  • 91,354
  • 19
  • 222
  • 219
  • Thank you! This works great. I'm still confused as to why they must be instance variables though as there is only one ThreadPool instance? – masterwok Jul 13 '14 at 17:17
  • 1
    @jtsan You keep creating new instances of the `ThreadPool` in a loop. While there's only one instance alive at a time, the changes you make to class variables of a class live forever, since they're changes on the **class**, not the instance of the class. So, when you change `instance1.__poolEvent`, that change to the `__poolEvent` class variable will be reflected in every instance of `ThreadPool` from now until eternity, even if `instance1` gets deleted, because `__poolEvent` is owned by the class, not the instance. – dano Jul 13 '14 at 17:24