this is my dataframe:
x day month
5 1 1
4 1 1
1 2 1
3 2 1
5 1 2
2 1 2
5 2 2
3 2 2
I need to take the sum of x values for each day in each month. I already have tried:
tapply(DF$x, DF$day, max)
but it is not giving the right answers.
this is my dataframe:
x day month
5 1 1
4 1 1
1 2 1
3 2 1
5 1 2
2 1 2
5 2 2
3 2 2
I need to take the sum of x values for each day in each month. I already have tried:
tapply(DF$x, DF$day, max)
but it is not giving the right answers.
Try the data.table
package:
library(data.table)
DT<-data.table(df)
DT[, list(Sum=sum(x)), by = c("day","month")]
day month Sum
1: 1 1 9
2: 2 1 4
3: 1 2 7
4: 2 2 8
OR use the sqldf
package:
sqldf("select day, month, sum(x) as sum from DT group by day, month")
OR using the base aggregate
function:
aggregate(DT$x, FUN=sum, by = list(DT$day, DT$month))
a more cleaner way suggested by Frank:
aggregate(x~day+month, DT, sum)
OR using the dplyr
package: (As suggested by Frank)
DT %>%
group_by(day,month) %>%
summarise(Sum = sum(x))
As the question title is about tapply
and the right answer
is not in the OP's post, if we need a cross-tabular version, one option with tapply
would be to place the grouping variables in a list
and specify the FUN
as sum
with(DF, tapply(x, list(day, month), FUN=sum))
# 1 2
#1 9 7
#2 4 8
Or this can be done with xtabs
. The default option is sum
xtabs(x~day+month, DF)
# month
#day 1 2
# 1 9 7
# 2 4 8
Or with by
by(DF[1], DF[-1], FUN= sum)