4

I have a DataFrame consisting of a set of data from 2 player games. Each game (with unique id) has a number of rounds where each of the 2 players chose some action. It looks like the following (I've removed some of the rows for clarity):

    gameId  round   player  action
0   Afom9bWqYBgZXXXN8   1   PvQ8B5kuA9Fbq9N59   1
1   Afom9bWqYBgZXXXN8   1   PJmJgrqusFZ8KRShQ   0
2   Afom9bWqYBgZXXXN8   2   PvQ8B5kuA9Fbq9N59   0
3   Afom9bWqYBgZXXXN8   2   PJmJgrqusFZ8KRShQ   0
4   Afom9bWqYBgZXXXN8   3   PJmJgrqusFZ8KRShQ   0
5   Afom9bWqYBgZXXXN8   3   PvQ8B5kuA9Fbq9N59   0
20  QdZM4yPMnjGj8f25R   1   Q6knaWEruc6BDPQT7   1
21  QdZM4yPMnjGj8f25R   1   xnAjMcWaFRpfBbukz   1
22  QdZM4yPMnjGj8f25R   2   xnAjMcWaFRpfBbukz   1
23  QdZM4yPMnjGj8f25R   2   Q6knaWEruc6BDPQT7   0
24  QdZM4yPMnjGj8f25R   3   Q6knaWEruc6BDPQT7   1
25  QdZM4yPMnjGj8f25R   3   xnAjMcWaFRpfBbukz   1
40  riMD6ctT8DLwdhHpE   1   EKkrMpMqy2PRLm7ur   1
41  riMD6ctT8DLwdhHpE   1   EqbbmngPfZBEmPTzq   1
42  riMD6ctT8DLwdhHpE   2   EKkrMpMqy2PRLm7ur   1
43  riMD6ctT8DLwdhHpE   2   EqbbmngPfZBEmPTzq   1
44  riMD6ctT8DLwdhHpE   3   EqbbmngPfZBEmPTzq   1
45  riMD6ctT8DLwdhHpE   3   EKkrMpMqy2PRLm7ur   1
60  hyEjkAg5K4WpubJA9   1   7CHpY4setLKb9ssnN   1
61  hyEjkAg5K4WpubJA9   1   hbud2J3YvitEhj4xZ   0
62  hyEjkAg5K4WpubJA9   2   hbud2J3YvitEhj4xZ   0
63  hyEjkAg5K4WpubJA9   2   7CHpY4setLKb9ssnN   0
64  hyEjkAg5K4WpubJA9   3   7CHpY4setLKb9ssnN   0
65  hyEjkAg5K4WpubJA9   3   hbud2J3YvitEhj4xZ   1
80  ay5pmpeNcwqHJ8JBH   1   tWA9ZxSnKpZyWwYsQ   1
81  ay5pmpeNcwqHJ8JBH   1   2qiHdJgL4WQe5qrHQ   1
82  ay5pmpeNcwqHJ8JBH   2   2qiHdJgL4WQe5qrHQ   1
83  ay5pmpeNcwqHJ8JBH   2   tWA9ZxSnKpZyWwYsQ   1
84  ay5pmpeNcwqHJ8JBH   3   tWA9ZxSnKpZyWwYsQ   1
85  ay5pmpeNcwqHJ8JBH   3   2qiHdJgL4WQe5qrHQ   1

I'd like to add a new column to the DataFrame that contains, for each player's action in a given round, his/her opponent's action in the previous round of the same game, if any. What's a fast, succinct way to do this instead of using a really long (and slow) loop?

Note that within each (gameId, round) key there are only two players with different ids. Dataframe.merge seems like a close match (example), but it would require something like the following:

df.merge(df_copy, left_on=['gameId', 'round', 'player'], \
         right_on=['gameId', df_copy.round - 1, df.player != df_copy.player])

but it can't support the df.player != df_copy.player in a self-join condition.

Andrew Mao
  • 35,740
  • 23
  • 143
  • 224

1 Answers1

2

I think you should start by replacing the player codes with generic aliases, e.g. 1 and 2. You can do it as follows:

s = df.groupby(['gameId', 'player']).size().reset_index(0, drop=True)
s[:] = np.arange(len(s)) % 2 + 1
df['player_alias'] = s.reindex(df.player).values

You can then construct and index of the previous round and opposing player for each row and map it to the corresponding action:

prev_round = df['round'] - 1 
opp_player = 3 - df.player_alias   # effectively maps 2 to 1 and 1 to 2

ix = pd.MultiIndex.from_arrays([df.gameId, prev_round, opp_player])
df['opp_prev_action'] = df.set_index(['gameId', 'round', 'player_alias']
                                     ).reindex(ix).action.values

Note that for round 1, prev_round is 0 which leads to nans in the desired column.

JoeCondron
  • 8,546
  • 3
  • 27
  • 28
  • Not all rounds have moves by both players (missing some data), so how could we make this robust to that? – Andrew Mao Oct 06 '15 at 20:38
  • This solution won't work for other cases, where you have a different (i.e not fixed/3) number of players though... No? – GrimSqueaker Jul 12 '23 at 08:55
  • The question is specifically about 2-player games. The solution first makes rebases the player IDs for each game to 1 or 2. Then a code the opposing player is trivial (1 -> 2, 2 -> 1). For more than 2 players,, the questions itself does not generalize to > 2 players as a choice would have to be made on which opposing player from which to choose the action. You could say you want the previous action from all opposing players but that would mean introducing a number of columns equal to the number of players - 1. – JoeCondron Jul 13 '23 at 14:06