x=df.groupby(['id_gamer'])[['sucess', 'nb_games']].shift(periods=1).cumsum()
.apply(lambda row: row.sucess/row.nb_games, axis=1)
In the code above, I make a groupby
on a pandas.DataFrame
in order to obtain a shifted column of results represented as ratio, for each gamer, and each game. Actually his rate of success considering the number of games he played.
It returns a pandas.core.series.Series
object as:
+---------------+----------------+
| Index | Computed_ratio |
+---------------+----------------+
| id_game_date | NaN |
| id_game2_date | 0.30 |
| id_game3_date | 0.40 |
| id_game_date | NaN |
| id_game4_date | 0.50 |
| ... | ... |
+---------------+----------------+
So, you may see the NaN
as the delimitation between gamers. As you may see the first gamer and the second one met in one game: id_game_date
. And this is why I would prefer the column of gamer from id_gamer
to appear in order to merge it with the dataframe where data are from.
To be honest I have an idea of solution: just do not use the id of games as index, then each row will be indexed correctly and there is no conflict when I proceed a merge, I guess. But I would like to know if it is possible with this current pattern shown here.
NB: I already tried with the solutions presented in this topic. But none of these work, certainly because the functions shown are aggregations and not mine: cumsum(). If I used an aggregating function like sum() (with a different pattern of code, do not try with the one I gave you or it will return an error) the id_gamer
appears. But it is not corresponding to my expectations.