5

I'm working on a project where throughput of my code quite important and after some consideration I choose to make my program threaded.

The main thread and the subthread both adds and removes from two shared dictionaries. I've been looking through the interwebs about some input considering the performance of locking in python, is it a slow operation, etc.

So what I'm getting at since python actually isn't actually threaded at all (thinking of the GIL only working on one core) if I need high performance in my application do I have anything to win by making it threaded except for handling IO?

EDIT

The actual question is (after a insightful comment)

Does multithreading make sense in python, since there's GIL?

Daniel Figueroa
  • 10,348
  • 5
  • 44
  • 66
  • 4
    Locking in *any* language is a performance bottleneck. Minimize locking where possible; don't use shared directories for example, create a tree instead and have each thread work in a different branch. – Martijn Pieters Aug 15 '12 at 08:54
  • Well that sounds reasonable since it's an atomic operation, but is the actual aqcuireing of a lock in python expensive? – Daniel Figueroa Aug 15 '12 at 08:57
  • 2
    The actual question seems to be "Does multithreading make sense in python, since there's GIL?" – Flavius Aug 15 '12 at 08:58
  • You can use the multiprocess module to make use of multiple cores. – Ramchandra Apte Aug 15 '12 at 09:05
  • @DanielFigueroa: No more so than in any other language; the locking ops are done in C in any case. – Martijn Pieters Aug 15 '12 at 09:07
  • Your question is a pretty generic one that cannot be answered precisely, because the right answer will depend on the details of the problem. Next time try to give example code of exactly what you are trying to do. – Roland Smith Aug 15 '12 at 09:17

2 Answers2

17

IMO, lock solution impacts a lot on performance mostly when muliptle threads really waiting for it.

The cost of acquiring and releasing an uncontended lock should be trivial.

This thread shows a testing about that.

Ok, here is the cost of acquiring and releasing an uncontended lock under Linux, with Python 3.2:

$ python3 -m timeit \
  -s "from threading import Lock; l=Lock(); a=l.acquire; r=l.release" \
  "a(); r()"

10000000 loops, best of 3: 0.127 usec per loop

And here is the cost of calling a dummy Python function:

$ python3 -m timeit -s "def a(): pass" "a(); a()"

1000000 loops, best of 3: 0.221 usec per loop

And here is the cost of calling a trivial C function (which returns the False singleton):

$ python3 -m timeit -s "a=bool" "a(); a()"

10000000 loops, best of 3: 0.164 usec per loop

Also, note that using the lock as a context manager is actually slower, not faster as you might imagine:

$ python3 -m timeit -s "from threading import Lock; l=Lock()" \
  "with l: pass"

1000000 loops, best of 3: 0.242 usec per loop

At least under Linux, there doesn't seem to be a lot of room for improvement in lock performance, to say the least.

PS: RLock is now as fast as Lock:

$ python3 -m timeit \
  -s "from threading import RLock; l=RLock(); a=l.acquire; r=l.release" \
  "a(); r()"

10000000 loops, best of 3: 0.114 usec per loop
Demindiro
  • 314
  • 1
  • 4
  • 15
Jcyrss
  • 1,513
  • 3
  • 19
  • 31
6

First of all, locking in any language is a performance bottleneck. Minimize locking where possible; don't use shared directories for example, create a tree instead and have each thread work in a different branch of that tree.

Since you'll be doing a lot of I/O, your performance problems will lie there and threading is not necessarily going to improve matters. Look into event-driven architectures first:

The GIL is not likely to be your problem here; it'll be released whenever a thread enters C code, for example (almost certainly during any I/O call). If it ever does become a bottleneck, move to multiple processes. On a large intranet cluster I administer, for example, we run 6 processes of each 2 threads to make full use of all the CPU cores (2 of the processes carry a very light load).

If you feel you need multiple processes, either use the multiprocessing module or make it easy to start multiple instances of your server (each listening on a different port) and use a load balancer such as haproxy to direct traffic to each server.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Thank you I will look into this and think about how i will go ahead. – Daniel Figueroa Aug 15 '12 at 09:11
  • Whether a particular lock is a bottleneck depends heavily on a program that uses it, the statement that locking is always a bottleneck and should always be minimized is often not pragmatic. Because the resources (time, money etc) you spend to reach your goal is the only thing that should always be minimized (E.g. the effort to create and maintain the program that does its job good enough). – Bob Jun 11 '18 at 16:18