0

I have a huge dataframe that looks like

df = pd.DataFrame([
    [1, "1/1/2023", 1, 3],
    [1, "1/1/2023", 2, 2],
    [1, "1/1/2023", 3, 1],
    [1, "1/1/2023", 4, 4],
    [2, "11/9/2022", 1, 2],
    [2, "11/9/2022", 2, 3],
    [2, "11/9/2022", 3, 1],
    [3, "17/4/2022", 5, 4],
    [3, "17/4/2022", 2, 1],
    [3, "17/4/2022", 3, 2],
    [3, "17/4/2022", 4, 3],
    [4, "1/3/2022", 1, 1],
    [4, "1/3/2022", 2, 2],
    [5, "1/1/2021", 1, 2],
    [5, "1/1/2021", 2, 3],
    [5, "1/1/2021", 3, 1],
], columns=["Race_ID", "Date", "Student_ID", "Rank"])
Race_ID   Date           Student_ID      Rank  
1         1/1/2023       1               3     
1         1/1/2023       2               2     
1         1/1/2023       3               1     
1         1/1/2023       4               4     
2         11/9/2022      1               2     
2         11/9/2022      2               3     
2         11/9/2022      3               1     
3         17/4/2022      5               4     
3         17/4/2022      2               1     
3         17/4/2022      3               2     
3         17/4/2022      4               3     
4         1/3/2022       1               1     
4         1/3/2022       2               2     
5         1/1/2021       1               2     
5         1/1/2021       2               3     
5         1/1/2021       3               1     

And I have the following subroutine:

for idx, (race, date, student, rank, _) in df.iterrows():
    this_race_competitors = df.loc[(df['Race_ID'] == race) & (df['Student_ID'] != student)]['Student_ID']
    other_races = df.loc[(df['Student_ID'] == student) & (df['Race_ID'] > race)][['Race_ID', 'Date', 'Rank']]

where this_race_competitors stores the Student_ID's of all competitors in the main race, and other_races stores the Race_ID, Date and obtained Rank by the student of interest.

The above subroutine works for the above toy dataframe just fine. However, when I use the code for my actual dataframe, the following error shows up:

ValueError: too many values to unpack (expected 5)

I tried to google above error but I couldn't resolve it. Thanks in advance.

Nayr borcherds
  • 395
  • 1
  • 6
  • Maybe use a "for idx, *rest in df.iterrows()" and print rest in order to see the structure of the iteration. I guess the most likely cause of the error is that the iteration unpacks more than 5 values. So you will need to handle the remaining values – P.Jo Mar 06 '23 at 11:04
  • can you please clarify why did you add "_" in the for loop? I think the error is showing you that it looks for the fifth column, but your DF has only 4 – dramarama Mar 06 '23 at 11:07

1 Answers1

0

You can change to this:

for idx, (race, date, student, rank) in df.iterrows():
    this_race_competitors = df.loc[(df['Race_ID'] == race) & (df['Student_ID'] != student)]['Student_ID']
    other_races = df.loc[(df['Student_ID'] == student) & (df['Race_ID'] > race)][['Race_ID', 'Date', 'Rank']]

  • de Gosson de Varenens, Thanks for your help. Now it has a new error: ValueError: too many values to unpack (expected 4) – Nayr borcherds Mar 06 '23 at 15:41
  • Are ALL your rows of th same length? Check that ```for idx, row in df.iterrows(): print(row) race, date, student, rank = row``` – Serge de Gosson de Varennes Mar 06 '23 at 15:55
  • I tried to run the above code to check but again the same error message shows up: ValueError: too many values to unpack (expected 4). All rows should be of same length tho. My original dataframe is of shape (217333, 48). – Nayr borcherds Mar 06 '23 at 17:17