Python Handling large number of Threads?

Question

# data is a list  

Threading_list=[]

class myfunction(threading.Thread):

    def __init__(self,val):
        .......
    .......

     def run(self):
        .......
        ....... 

for i in range(100000):

    t=myfunction(data[i]) # need to execute this function on every datapoint 
    t.start()
    Threading_list.append(t)

for t in Threading_list:
    t.join()

This will create around 100000 threads, but i am allowed to create a maximum of 32 threads ? What modifications can be done in this code ?

Any particular reason you are not using a worker pool of some sort? With that many worker threads, it seems like a rethink is needed. — m.brindley, Feb 03 '13 at 07:22
Does your CPU support 100000 parallel threads? If so, where can I get one? — Alex L, Feb 06 '13 at 07:25

Ellioh · Accepted Answer · 2013-02-06T07:21:13.293

So many Python threads rarely need to be created. Even more, I hardly can imagine a reason for that. There are suitable architectirual patterns to solve the tasks of creating code executing in parallel that limit the number of threads. One of them is reactor.

What are you trying to do?

And remeber that, due to GIL, Python threads do not give any performance boost for computational tasks, even on multiprocessor and multiple kernel systems (BTW, can there be a 100000-kernel system? I doubt. :)). The only chance for boost is if the computational part is performed inside modules written in C/C++ that do their work without acquiring GIL. Usually Python threads are used to parallel the execution of code that contains blocking I/O operations.

UPD: Noticed the stackless-python tag. AFAIK, it supports microthreads. However, it's still unclear what are you trying to do.

And if you are trying just to process 100000 values (apply a formula to each of them?), it's better to write something like:

def myfunction(val):
    ....
    return something_calculated_from_val

results = [myfunction(d) for d in data] # you may use "map(myfunction, data)" instead

It should be much better, unless myfunction() performs some blocking I/O. If it does, ThreadPoolExecutor may really help.

Unrelated to the OP (and while I agree w/ you mostly, and also that this post is old), sometimes we don't control thread creation. Eg. I use socketio in python, the # of threads totally depend upon the events on the channels I am listening on. In that case, I see thread number going as high as 5k I like it or not. And, green/micro threads aren't so useful here, threads can actually do this I/O bound task parallely, not the green threads. — 0xc0de, Jan 03 '22 at 07:20

score 0 · Answer 2 · answered Feb 03 '13 at 08:04

0

Here is an example that will compute squares of a list of any length, using 32 threads through a ThreadPoolExecutor. As Ellioh said, you may not want to use threads in some cases, so you can easily switch to ProcessPoolExecutor.

import concurrent.futures

def my_function(x):
    return 2**x

data = [1, 6, 9, 3, 8, 4, 213, 534]

with concurrent.futures.ThreadPoolExecutor(max_workers=32) as executor:
    result = list(executor.map(my_function, data))

print(result)

answered Feb 03 '13 at 08:04

Oleh Prypin

33,184
10
89
99

In Python 2 [`multiprocessing.dummy.Pool` can be used](http://stackoverflow.com/a/14594205/4279) to avoid 3rd party dependency. There is no point to use threads for CPU-bound tasks in CPython – jfs Feb 03 '13 at 08:07
There may be, but only if the computational part is performed in non-Python extension modules that do not use GIL while performing computations. Anyway, it is not clear what the question author tries to achieve. I would wait with any solutions containing code until he clarifies the problem. – Ellioh Feb 03 '13 at 08:08
Thanks Blaxpirit for your response.I dun have 1Lakh kernels . i have 1 lakh data values which needs to be processed separately . is there any other way of doin it other than threads ? – Mok Feb 05 '13 at 18:17
1

What is "processed separately"? What kind of processing is required? If you are trying to apply some mathematical expression to each of 100000 values, can you just process them sequentially? Python threads can not help that, single thread implementation is likely to be faster than one using reactors / thread pools. – Ellioh Feb 06 '13 at 07:11

Python Handling large number of Threads?

2 Answers2