2
x=df.groupby(['id_gamer'])[['sucess', 'nb_games']].shift(periods=1).cumsum()
                            .apply(lambda row: row.sucess/row.nb_games, axis=1)

In the code above, I make a groupby on a pandas.DataFrame in order to obtain a shifted column of results represented as ratio, for each gamer, and each game. Actually his rate of success considering the number of games he played.

It returns a pandas.core.series.Series object as:

+---------------+----------------+
|     Index     | Computed_ratio |
+---------------+----------------+
| id_game_date  | NaN            |
| id_game2_date | 0.30           |
| id_game3_date | 0.40           |
| id_game_date  | NaN            |
| id_game4_date | 0.50           |
| ...           | ...            |
+---------------+----------------+

So, you may see the NaN as the delimitation between gamers. As you may see the first gamer and the second one met in one game: id_game_date. And this is why I would prefer the column of gamer from id_gamer to appear in order to merge it with the dataframe where data are from.

To be honest I have an idea of solution: just do not use the id of games as index, then each row will be indexed correctly and there is no conflict when I proceed a merge, I guess. But I would like to know if it is possible with this current pattern shown here.

NB: I already tried with the solutions presented in this topic. But none of these work, certainly because the functions shown are aggregations and not mine: cumsum(). If I used an aggregating function like sum() (with a different pattern of code, do not try with the one I gave you or it will return an error) the id_gamer appears. But it is not corresponding to my expectations.

m13op22
  • 2,168
  • 2
  • 16
  • 35
AvyWam
  • 890
  • 8
  • 28

0 Answers0