0

I have two datatables. One with scores over time, and another I want to populate with the sum of scores over a specific number of games.

> dt = data.table(game=c(1, 2, 3, 1, 2, 3, 1, 2, 3),
    player=c("ace", "ace", "ace", "bob", "bob", "bob", "casey", "casey", "casey"),
    points=c(5, 2, 3, 2, 6, 7, 3, 4, 2))
> dt
   game player points
1:    1    ace      5
2:    2    ace      2
3:    3    ace      3
4:    1    bob      2
5:    2    bob      6
6:    3    bob      7
7:    1  casey      3
8:    2  casey      4
9:    3  casey      2

> out = data.table(start=c(1, 1, 3),
    end=c(2, 2, 3),
    player=c("ace", "bob", "casey"))
> out
   start end player
1:     1   2    ace
2:     1   2    bob
3:     3   3  casey
> ???
> ???
> out 
   start end player points
1:     1   2    ace      7
2:     1   2    bob      8
3:     3   3  casey      2

The non-R way would be to iterate each row, filter for the player and the game numbers such that the game number in dt is greater than start and less than end, then sum the points column from dt and put it into a new column in out.

What is the best way to do this in R?

ZtoYi
  • 180
  • 1
  • 1
  • 11

1 Answers1

4

You can use non-equi join in data.table and then sum points which overlap in the range.

library(data.table)

dt[out, .(start, end, game, player, points), 
         on=.(player, game>=start, game<=end)][
       , .(points = sum(points)), by = .(start, end, player)]

#   start end player points
#1:     1   2    ace      7
#2:     1   2    bob      8
#3:     3   3  casey      2
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213