I need to implement multithreading to a Python job.
I have a dictionary and each key in that dictionary (out of about 40) is a timestamped pandas dataframe. Most of the dataframes have 100,000+ rows. Their timestamps are strings in "%Y-%m-%d %H:%M:%S"
format.
To convert the timestamped strings I use the following function:
def to_dt(df):
df['timestamp'] = df['timestamp'].map(lambda n: pd.to_datetime(n, format='%Y-%m-%d %H:%M:%S'))
return df
So I would like to put each process to_dt(df)
in a separate thread. How can I do that?
To simplify let's consider we have the following setup:
def to_dt(df):
df['timestamp'] = df['timestamp'].map(lambda n: pd.to_datetime(n, format='%Y-%m-%d %H:%M:%S'))
return df
# empty dictionary
d_test = {}
# dataframe with single string timestamp column
df = pd.DataFrame(columns=['st_dt'])
# populate dataframe with 1000 timestamp rows
for i in range(1000):
df.loc[len(df)] = ['2018-10-02 10:00:00']
# add 20 instances of the dataframe to the dictionary with keys in format "a0" to 'a19'
for i in range(20):
d_test['a'+str(i)] = df
Now how can we make each iteration of
for i in range(20):
to_dt(d_test['a'+str(i)])
to run in a separate thread?