I really need to speed some R code up. I have a large dataset from a particular sport. Each row in the data frame represents some type of action in the game. For each game (game_id
) we have two teams (team_id
) that take part in the game. time_ref
in the data frame are the actions in chronological order for each game. type_id
is the type of action in the game. player_off
is set as TRUE
or FALSE
and is linked to action_id=3
. action_id=3
represents a player getting a card and player_off
is set to TRUE
/FALSE
if the player was sent off when they got that card. Example data.frame:
> df
game_id team_id action_id player_off time_ref
100 10 1 NA 1000
100 10 1 NA 1001
100 10 1 NA 1002
100 11 1 NA 1003
100 11 2 NA 1004
100 11 1 NA 1005
100 10 3 1 1006
100 11 1 NA 1007
100 10 1 NA 1008
100 10 1 NA 1009
101 12 3 0 1000
101 12 1 NA 1001
101 12 1 NA 1002
101 13 2 NA 1003
101 13 3 1 1004
101 12 1 NA 1005
101 13 1 NA 1006
101 13 1 NA 1007
101 12 1 NA 1008
101 12 1 NA 1009
What I need is another column in the data frame that gives me TRUE
or FALSE
on whether both teams had an equal/unequal number of players on the field while each action (row) took place.
So game_id=100
had an action_id=3
& player_off=1
for team_id=10
at time_ref=1006
. So we know the teams were equal with number of players on the field up to that point but unequal for the rest of the game (time_ref>1006
). The same thing occurred in game_id=101
also.
This an example of the data frame with an extra column I would like to have for the dataset.
>df
game_id team_id action_id player_off time_ref is_even
100 10 1 NA 1000 1
100 10 1 NA 1001 1
100 10 1 NA 1002 1
100 11 1 NA 1003 1
100 11 2 NA 1004 1
100 11 1 NA 1005 1
100 10 3 1 1006 1
100 11 1 NA 1007 0
100 10 1 NA 1008 0
100 10 1 NA 1009 0
101 12 3 0 1000 1
101 12 1 NA 1001 1
101 12 1 NA 1002 1
101 13 2 NA 1003 1
101 13 3 1 1004 1
101 12 1 NA 1005 0
101 13 1 NA 1006 0
101 13 1 NA 1007 0
101 12 1 NA 1008 0
101 12 1 NA 1009 0
So you can see that in game_id=100
a player was sent off at time_ref=1006
so all previous rows were marked as is_even=1
and subsequent marked as uneven or 0
. Similar for game_id=101
at time_ref=1004
.
What is the most efficient way of achieving this extra column? Preferably not using for loops.