2
# data is a list  

Threading_list=[]

class myfunction(threading.Thread):

    def __init__(self,val):
        .......
    .......

     def run(self):
        .......
        ....... 

for i in range(100000):

    t=myfunction(data[i]) # need to execute this function on every datapoint 
    t.start()
    Threading_list.append(t)

for t in Threading_list:
    t.join()

This will create around 100000 threads, but i am allowed to create a maximum of 32 threads ? What modifications can be done in this code ?

casperOne
  • 73,706
  • 19
  • 184
  • 253
Mok
  • 277
  • 1
  • 6
  • 16

2 Answers2

4

So many Python threads rarely need to be created. Even more, I hardly can imagine a reason for that. There are suitable architectirual patterns to solve the tasks of creating code executing in parallel that limit the number of threads. One of them is reactor.

What are you trying to do?

And remeber that, due to GIL, Python threads do not give any performance boost for computational tasks, even on multiprocessor and multiple kernel systems (BTW, can there be a 100000-kernel system? I doubt. :)). The only chance for boost is if the computational part is performed inside modules written in C/C++ that do their work without acquiring GIL. Usually Python threads are used to parallel the execution of code that contains blocking I/O operations.

UPD: Noticed the stackless-python tag. AFAIK, it supports microthreads. However, it's still unclear what are you trying to do.

And if you are trying just to process 100000 values (apply a formula to each of them?), it's better to write something like:

def myfunction(val):
    ....
    return something_calculated_from_val

results = [myfunction(d) for d in data] # you may use "map(myfunction, data)" instead

It should be much better, unless myfunction() performs some blocking I/O. If it does, ThreadPoolExecutor may really help.

Ellioh
  • 5,162
  • 2
  • 20
  • 33
  • Unrelated to the OP (and while I agree w/ you mostly, and also that this post is old), sometimes we don't control thread creation. Eg. I use socketio in python, the # of threads totally depend upon the events on the channels I am listening on. In that case, I see thread number going as high as 5k I like it or not. And, green/micro threads aren't so useful here, threads can actually do this I/O bound task parallely, not the green threads. – 0xc0de Jan 03 '22 at 07:20
0

Here is an example that will compute squares of a list of any length, using 32 threads through a ThreadPoolExecutor. As Ellioh said, you may not want to use threads in some cases, so you can easily switch to ProcessPoolExecutor.

import concurrent.futures

def my_function(x):
    return 2**x

data = [1, 6, 9, 3, 8, 4, 213, 534]

with concurrent.futures.ThreadPoolExecutor(max_workers=32) as executor:
    result = list(executor.map(my_function, data))

print(result)
Oleh Prypin
  • 33,184
  • 10
  • 89
  • 99
  • In Python 2 [`multiprocessing.dummy.Pool` can be used](http://stackoverflow.com/a/14594205/4279) to avoid 3rd party dependency. There is no point to use threads for CPU-bound tasks in CPython – jfs Feb 03 '13 at 08:07
  • There may be, but only if the computational part is performed in non-Python extension modules that do not use GIL while performing computations. Anyway, it is not clear what the question author tries to achieve. I would wait with any solutions containing code until he clarifies the problem. – Ellioh Feb 03 '13 at 08:08
  • Thanks Blaxpirit for your response.I dun have 1Lakh kernels . i have 1 lakh data values which needs to be processed separately . is there any other way of doin it other than threads ? – Mok Feb 05 '13 at 18:17
  • 1
    What is "processed separately"? What kind of processing is required? If you are trying to apply some mathematical expression to each of 100000 values, can you just process them sequentially? Python threads can not help that, single thread implementation is likely to be faster than one using reactors / thread pools. – Ellioh Feb 06 '13 at 07:11