0

To iterate through each line of a DataFrame I use .iterrows():

list_soccer = pd.DataFrame({
    'EventName': [obj_event.event.name for obj_event in matches],
    'IDEvent': [obj_event.event.id for obj_event in matches],
    'LocalEvent': [obj_event.event.venue for obj_event in matches],
    'CodeCountry': [obj_event.event.country_code for obj_event in matches],
    'TimeZone': [obj_event.event.time_zone for obj_event in matches],
    'OpenDate': [obj_event.event.open_date for obj_event in matches],
    'Total_Market': [obj_event.market_count for obj_event in matches],
    'Local_Date': [obj_evento.event.open_date.replace(tzinfo=datetime.timezone.utc).astimezone(tz=None) 
                            for obj_evento in matches]
    })

for_iterate = list_soccer.reset_index()
for_iterate = for_iterate[for_iterate['EventName'].str.contains(" v ")]
data_for_compare = (datetime.datetime.utcnow()).strftime("%Y-%m-%d %H:%M")
for_iterate = for_iterate[for_iterate['OpenDate'] >= data_for_compare]
    
for index, example_dataframe in for_iterate.iterrows():
    multiprocessing.Process(target=add_hour, args=(example_dataframe,))

As I need to double the speed of this iteration (to call two multiprocessing at the same time), I'm looking for a way to use two lines at a time.

If it was a regular list (please note that i am only giving this example below to demonstrate what i need, i understand that there is nothing similar between a list and a dataframe), I could do it like this:

a_list = ['a','b','c','d']
a_pairs = [a_list[i:i+2] for i in range(0, len(a_list)-1, 2)]
# a_pairs = [['a','b'],['c','d']]
for a, b in a_pairs:
    multiprocessing.Process(target=add_hour, args=(a,))
    multiprocessing.Process(target=add_hour, args=(b,))

How should I proceed with DataFrame to work with two rows at the same time?

In this question, I found two answers but they deliver options that repeat values inside the DataFrame:
Pandas iterate over DataFrame row pairs

What I am not able to create is a model so that the lines are not repeated, for example, using rows 0 and 1 then use 2 and 3 then use 4 and 5, so maybe someone says that the question is repeated, but in fact, my need is different and I was not able to transform those options into one for my necessity.

Digital Farmer
  • 1,705
  • 5
  • 17
  • 67

1 Answers1

2

You should be able to split the DataFrame in two, using similar indexing to as you do on lists.

Then, you can iterate over both at once, which gives you two rows at a time in order (so 0,1 then 2,3 etc)

df_a = for_iterate.iloc[::2] # Get all the even rows
df_b = for_iterate.iloc[1::2] # Get all the odd rows

for (_, example_dataframe_a), (_, example_dataframe_b) in zip(df_a.iterrows(), df_b.iterrows()):
    multiprocessing.Process(target=add_hour, args=(example_dataframe_a,))
    multiprocessing.Process(target=add_hour, args=(example_dataframe_b,))

(Although it's unclear to me why you need to spawn a process for each row of the dataframe, rather than two processes, one for each half of for_iterate).

Alternatively:

You could try using multiprocessing.Pool.map() to perform two requests at once. Unlike the above approach, a new request would be made as soon as a previous one completes (so it wouldn't wait for both to finish before dispatching the next two), and only two processes would be needed which could be re-used:

from multiprocessing import Pool

def add_hour_wrapper(data):
  # iterrows returns two arguments, we only want one
  _, row = data
  return add_hour(row)

pool = Pool(2) # 2 processes

pool.map(add_hour, for_iterate.iterrows())
mxbi
  • 853
  • 6
  • 25
  • Wow, iterating between evens and odds, that was sensational in the approach! Now I will delve into this when the total number of rows is odd, because it would cause looping error, if you want to add this to the answer, that would be nice, if not, no problem. – Digital Farmer Apr 15 '22 at 22:11
  • And @mxbi The reason for using it is because I'm going to send the data to an API, which I can only make two calls at a time at most, I can't exceed two calls at a time. – Digital Farmer Apr 15 '22 at 22:11
  • 1
    @BrondbyIF I added another solution using `multiprocess.Pool` which might be more suited for your use-case. – mxbi Apr 15 '22 at 22:56
  • Hi mate, I thought better and thought it was important to create a new question for people who have the doubt in the future specifically about getting around when we don't have total even lines, if you want to create your answer: https://stackoverflow.com/q/71889684/11462274 – Digital Farmer Apr 15 '22 at 23:01