1

Here's the data after the preliminary data cleaning.

year country employees
2001 US 9
2001 Canada 81
2001 France 22
2001 Japan 31
2001 Chile 7
2001 Mexico 15
2001 Total 165
2002 US 5
2002 Canada 80
2002 France 20
2002 Japan 30
2002 Egypt 35
2002 Total 170
... ... ...
2010 US 32
... ... ...

What I want to get is the table below, which is summing up all countries except "US, Canada, France, and Japan" into 'others'. The list of countries varies every year from 2001 to 2010 so I want to use a for loop with if condition to loop over every year.

year country employees
2001 US 9
2001 Canada 81
2001 France 22
2001 Japan 31
2001 Others 22
2001 Total 165
2002 US 5
2002 Canada 80
2002 France 20
2002 Japan 30
2002 Others 35
2002 Total 170

Any leads would be greatly appreciated!

Rerepython
  • 17
  • 1
  • 2
  • What you are looking for is in this previous question https://stackoverflow.com/questions/65964178/add-sum-to-all-grouped-rows-in-pandas-dataframe – Reda El Hail Oct 19 '22 at 20:53

1 Answers1

0

You may consider dropping Total from your dataframe.

However, as stated, your question can be solved by using Series.where to map away values that you don't recognize:

country = df["country"].where(df["country"].isin(["US", "Canada", "France", "Japan", "Total"]), "Others")
df.groupby([df["year"], country]).sum(numeric_only=True)
Code Different
  • 90,614
  • 16
  • 144
  • 163