I am using pythons billiard multiprocessing package to run some queries in a postgres db. I have a list with years let's say [2020, 2021, 2022] and a list of queries that take the years in a parametrised where
clause. What I do is pass the the list to Pools
via map
and what I was expecting was to see in my db running parallel queries for different years, but instead what I see is 16 processes running for the same years.
Ex (where year = 2020) x 4 | (where year = 2021) x 4 etc... Could you please help me understand why this is happening?
Code used:
years = [2019, 2020, 2021]
params = zip(years, repeat(<table_name>), repeat(<sql>), repeat(<parameters>))
pool = Pool(processes=4)
try:
r = pool.map(run_sql, params)
except Exception as exc:
pool.close()
raise Exception(f'{name} generated an exception: {exc}')
else:
logging.info(f'''Loaded {name}-{r}
--------------------------------
''')
pool.close()
def run_sql(params):
year, name, sql, parameters = params[0], params[1], params[2], params[3]
try:
sql_filtered = sql + f''' where date_part('year', created_at) = {year}'''
<module to run sql>
return years
except Exception as e:
raise e