12

I'm trying to split a process that takes a long time to multiple processes using concurrent.futures module. Attached is the code below

Main function:

with concurrent.futures.ProcessPoolExecutor() as executor:
for idx, score in zip([idx for idx in range(dataframe.shape[0])],executor.map(get_max_fuzzy_score,[dataframe[idx:idx+1] for idx in range(dataframe.shape[0])])):
    print('processing '+str(idx+1)+' of '+str(dataframe.shape[0]+1))
    dataframe['max_row_score'].iloc[idx] = score

get_max_fuzzy_score function:

def get_max_fuzzy_score(picklepath_or_list, df):
import numpy as np
extracted_text_columns = list(df.filter(regex='extracted_text').columns)
data_list = [df[data].iloc[0] for data in extracted_text_columns if not df[data].isnull().values.any()]
try:
    size = len(picklepath_or_list)
    section_snippet_list = picklepath_or_list
except:
    section_snippet_list = pickle.load(open(picklepath_or_list,'rb'))
scores = []
for section_snippet in section_snippet_list:
    for data in data_list:
        scores.append(fuzz.partial_ratio(data,section_snippet))
score = max(scores)

return score

The function takes values of a few columns and returns the max fuzzy score from a list that is built previously.

Here's the error I get:

Traceback (most recent call last):
  File "multiprocessing.py", line 8, in <module>
    import concurrent.futures
  File "/home/naveen/anaconda3/lib/python3.6/concurrent/futures/__init__.py", line 17, in <module>
    from concurrent.futures.process import ProcessPoolExecutor
  File "/home/naveen/anaconda3/lib/python3.6/concurrent/futures/process.py", line 53, in <module>
    import multiprocessing
  File "/home/naveen/Documents/pramata-ie/data-science/scripts/multiprocessing.py", line 79, in <module>
    with concurrent.futures.ProcessPoolExecutor() as executor:
AttributeError: module 'concurrent' has no attribute 'futures'
Neuron
  • 5,141
  • 5
  • 38
  • 59
Naveen Bharadwaj
  • 325
  • 1
  • 2
  • 12

2 Answers2

20

You can import it this way:

import concurrent.futures

and use it this way:

executor = concurrent.futures.ThreadPoolExecutor(max_workers=num_workers)

You can also import ThreadPoolExecutor this way:

from concurrent.futures.thread import ThreadPoolExecutor

and use it this way:

executor = ThreadPoolExecutor(max_workers=num_workers)
Neuron
  • 5,141
  • 5
  • 38
  • 59
The AI Architect
  • 1,887
  • 2
  • 17
  • 24
  • 3
    Thanks that helps (upvoted). But I couldn't understand why this is happening? If futures is a sub module then we should be able to access it by . (dot) notation. – Himanshu Patel Jan 15 '22 at 14:58
2

Don't name your python file as threading.py or multiprocessing.py

Tonmoy
  • 21
  • 1
  • 2
    Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Dec 08 '21 at 01:57