2

I have a data frame with data as follows (although my data set is much bigger)

ID  Count  Size
1   1      35
1   2      42
1   2      56
2   3      25
2   5      52
2   2      62

etc....

I would like to extract the total count for each ID but split for when the size variable is either <50 or <=50

So far I have done this to get the cumulative count based on the unique id

cbind(aggregate(Count~ID, sum, data=df)

To produce this

ID Count
1  5
2  10

But I want to produce something like this instead

ID  <50  >=50
1   3    2
2   3    7

I've tried searching on how best to do this and am sure it is really simple but I'm struggling on how best to achieve this...any help would be great thanks!

J. Cee
  • 49
  • 5

2 Answers2

1

We could use data.table. Convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'ID', we get the sum of 'Count' based on the logical indexes ('Size < 50,Size >=50`)

library(data.table)
setDT(df1)[,list(`<50` = sum(Count[Size <50]), 
            `>=50` = sum(Count[Size>=50])) , by = ID]
#   ID <50 >=50
#1:  1   3    2
#2:  2   3    7

A similar option with dplyr is

library(dplyr)
df1 %>%
    group_by(ID) %>% 
    summarise(`<50` = sum(Count[Size <50]),
             `>=50` = sum(Count[Size>=50]))

NOTE: It is better to name the columns as less50, greaterthanEq50 instead of the names suggested in the expected output.

akrun
  • 874,273
  • 37
  • 540
  • 662
0

Continue your idea, you can actually aggregate on df[df$Size<50,] instead of df, and do this again for >=50 then merge.

d1 = aggregate(Count~ID,sum,data=df[df$Size<50,])
d2 = aggregate(Count~ID,sum,data=df[df$Size>=50,])
merge(d1,d2,by="ID",all=TRUE)

This is just based on what you already did, but not the best I guess..

Gentlezerg
  • 286
  • 2
  • 9