0

To start off, a little about my problem.

I have a data frame of winners of the champions league cup indexed by years.

Like this, note team names are strings.

year    team         need this         year    team      wins to date

1       team1                           1       team1          1
2       team2                           2       team2          1
3       team1                           3       team1          2
4       team3                           4       team3          1

I would like to create a new column with cumulative wins, so that i can then plot a line graph of each team and the number of times they have won the cup. ie I need the year on the x axis and the cumulative frequency on the y, with 4 different lines for the top 4 teams.

There is probably an easy way using the count function and a loop but I'm fairly new to python and don't have a very good grasp of these. Any help would be much appreciated!

kennytm
  • 510,854
  • 105
  • 1,084
  • 1,005
Frank Lee
  • 23
  • 3

2 Answers2

1

Look at pandas documentation on groupby, transform, cumcount, and cumsum (http://pandas.pydata.org/pandas-docs/stable/groupby.html). You can groupby team and use transform to do a cumsum on bool result of whether the team is null or not.

df

Input:

   year   team
0     1  team1
1     2  team2
2     3  team1
3     4  team3

Do transformation

df['wins to date'] = df.groupby('team').transform(lambda x: x.notnull().cumsum())
df

or as DSM suggest:

df['wins to date'] = df.groupby('team').cumcount()+1

output

   year   team  wins to date
0     1  team1             1
1     2  team2             1
2     3  team1             2
3     4  team3             1
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
0

I think you just want to use groupby with Series.cumsum. As in:

df.groupby('team').cumsum()
splinter
  • 3,727
  • 8
  • 37
  • 82
  • how would i apply this to string values of the team? the cumsum function only appends the strings together (team1team1team1, rather than just digit 3). I'd like to count the number. Is there a way to do this? – Frank Lee Apr 08 '17 at 19:48
  • but df['wins to date'] are integers, no? Isn't that the column you want to cumulatively sum? – splinter Apr 08 '17 at 19:49
  • oh sorry 'wins to date' is a column id like to create, it hasnt got any digits in it, id like to fill it with the number of times 'team1' won, up to thjat row – Frank Lee Apr 08 '17 at 20:01
  • i can't get it to work, the values are NaN. i need to count the recurrences of the string 'team1' – Frank Lee Apr 08 '17 at 20:26
  • i tried: groupby and count as well, but the values stay as 1 or NaN, :s – Frank Lee Apr 08 '17 at 20:27