3

I'm accessing an API to get specific public budget from Brazil. It needs to define year, month and page. I was successful to use for loops to get the info I want for the year of 2020, looping through months {j} and pages (str +1).

How can I parallelize the following (even better if I can turn it to a def function and using map)?

list1 = []

for i in tqdm(range(x)):
        for j in tqdm(range(1,13)):
            url = f'https://gatewayapi.prodam.sp.gov.br:443/financas/orcamento/sof/v3.0.1/empenhos?anoEmpenho=2020&mesEmpenho={j}&codOrgao=84&numPagina=' + str(i+1)
            headers = {"Accept": "application/json", "Authorization": "Bearer xxxxxxxxxxxxxx"}
            response = requests.get(url, headers = headers)
            list1.append(response.json())

df_final = pd.DataFrame()
for i in range(len(list1)):
    df_temp = pd.DataFrame(list1[i]['lstEmpenhos'])
    df_final = df_final.append(df_temp)

df_final

1 Answers1

3

One thought might be to take the code in your nested for loop and break it out into a function:

def get_data(pair):
    i, j = pair
    url = f'https://gatewayapi.prodam.sp.gov.br:443/financas/orcamento/sof/v3.0.1/empenhos?anoEmpenho=2020&mesEmpenho={j}&codOrgao=84&numPagina=' + str(i+1)
    headers = {"Accept": "application/json", "Authorization": "Bearer xxxxxxxxxxxxxx"}
    response = requests.get(url, headers = headers)
    return response.json()

Then you could use something like the ThreadPoolExecutor and map that against your values. You could make this a lot better, but extremely naïvely:

list1 = []
parameters = []

pool = ThreadPoolExecutor(workers=6)

for i in tqdm(range(x)):
    for j in tqdm(range(1,12)):
        parameters.append((i, j))

list1 = list(pool.map(get_data, parameters[0:x]))
josephkibe
  • 1,281
  • 14
  • 28
  • Thank you, @josephkibe. I think the def function is correct. But I think that is something wrong with the second part, it proccess too fast and I get as a result: `.result_iterator at 0x000001731998A0B0>`. If I check list1 `type` it says it's a `generator`. I will try to fix that. – Guilherme Giuliano Nicolau Apr 17 '21 at 16:08
  • Happy to help. I'll admit I wrote all this in the question editor, so apologies if it doesn't quite work as-is. I think it should get you 90% there, though. Feel free to edit my answer with code that actually works! – josephkibe Apr 17 '21 at 16:09
  • 1
    Got it. I just had to list the pool and set the parameters. Like that: `list1 = list(pool.map(get_data, parameters[0:4]))` – Guilherme Giuliano Nicolau Apr 19 '21 at 19:01
  • Great! Glad you were able to figure it out. – josephkibe Apr 19 '21 at 21:25