I have following query that returns data between a date ranges.
dates = ['20100101','20100201',20100301','20100401']
query = 'select date, company_name, total_amount from info_stats where date between 'start_date' and 'end_date'
I obtain date ranges from another process and construct a list so that I can iterate as follows:
pds = []
for idx in range(0, len(dates) - 1):
formated_query = self.get_formated_query(start_date=dates[idx].strftime('%Y%m%d'),
end_date=dates[idx + 1].strftime('%Y%m%d')
results_df = pds.append(pd.read_sql(sql=formated_query,con=db_connect))
to query I am passing date at index
and index + 1
(date greater than date at the index)
These queries take super long time and i want to execute in a parallelize manner so that wait time is shorter. I went over joblib
but not sure if this is multi-threading
or multi-processing
. Looks like the former. Also new to joblib
, how can I parallelize above code using joblib
or other package?