-1

In R I have a data.frame data containing user, game and number which is the number of times a game has been played for a fixed user.

Let a fixed user be given and consider this

user  game  number
1     1     110
1     2     95
1     3     263
1     4     55
1     5     24
1     6     10

Now Im looking for games that are dominant. A dominant game is a game than a user has played 25% of the times. I want to find which games has been played more than 25% of the times in this case, so I type

u=c()
for(j in 1:6) {
# Check if the percentages is higher than 25
if(data[j,3] / sum(data[,3]) > 0.25) {
u[j]=data[j,2]
}
else{u[j]=0}

But when I type this in R I get this strange message

Error in if(....): missing value where TRUE/FALSE needed
Ole Petersen
  • 670
  • 9
  • 21
  • 1
    add a `}` at the end of your code and it will run fine – etienne Nov 27 '15 at 11:05
  • 3
    And it might be best to use `ifelse(data[,3]>0.25*colSums(data)[[3]],data[,2],0)` instead of a for loop. This is also probably a question which could be easily solved with `data.table` – etienne Nov 27 '15 at 11:07
  • agree with @etienne, this is done much more easily with `data.table`, avoiding the for loop. Something like: `dt = data.table(data)` and then `u = dt[dt$number > sum(dt$number)]` seems to answer the question as given. – John Faben Nov 27 '15 at 11:41
  • but you should read [this answer](http://stackoverflow.com/questions/20110092/r-add-a-new-column-of-the-sum-by-group) if you have multiple users in the table, to get sums per user. – John Faben Nov 27 '15 at 11:42

1 Answers1

1

You forgot to add > 0.25. Also you have to initialize the u vector now u[j] will cause an error since u has length 0.

A good way to solve problems like this is to use the dplyr package

   newdata <-  data %>% group_by(user)
                    %>% mutate(perc = number/sum(number))
                    %>% filter(perc > 0.25)
nist
  • 1,706
  • 3
  • 16
  • 24