1

Pivoting and subtotals are common auxiliary steps in spreadsheets and SQL.

Assume a data.table with the fields date, myCategory, revenue. Assume that you want to know the proportion of day revenue of all revenue and the proportion of day revenue within different subgroup such that

 b[,{
    #First auxiliary variable of all revenue
    totalRev = sum(revenue)                     #SUBGROUP OF ALL REV

    #Second auxiliary variable of revenue by date, syntax wrong! How to do this?
    {totalRev_date=sum(revenue), by=list(date)} #DIFFERENT SUBGROUP, by DATE's rev

    #Within the subgroup by date and myCategory, we will use 1st&2nd auxiliary vars
    .SD[,.(Revenue_prop_of_TOT=revenue/totalRev,
          ,Revenue_prop_of_DAY=revenue/totalRev_date)    ,by=list(myCategory,date)]
    },]

where we need to compute the auxiliary sums, all revenue of specific day and all revenue of whole history.

The end result should look like this:

date            myCategory       Revenue_prop_of_TOT         Revenue_prop_of_DAY
2019-01-01      Cat1             0.002                       0.2
...

where you see that the auxiliary variables are only help functions.

How can you pivot and compute subtotals within R data.table?

hhh
  • 50,788
  • 62
  • 179
  • 282

3 Answers3

2

Another option using data.table::cube:

cb <- cube(DT, sum(value), by=c("date","category"), id=TRUE)

cb[grouping==0L, .(date, category,

    PropByDate = V1 / cb[grouping==1L][.SD, on="date", x.V1],

    PropByCategory = V1 / cb[grouping==2L][.SD, on="category", x.V1],

    PropByTotal = V1 / cb[grouping==3L, V1]
)]

output:

   date category PropByDate PropByCategory PropByTotal
1:    1        1  0.3333333      0.2500000         0.1
2:    1        2  0.6666667      0.3333333         0.2
3:    2        1  0.4285714      0.7500000         0.3
4:    2        2  0.5714286      0.6666667         0.4

data:

DT <- data.table(date=c(1, 1, 2, 2), category=c(1, 2, 1, 2), value=1:4)

#   date category value
#1:    1        1     1
#2:    1        2     2
#3:    2        1     3
#4:    2        2     4
chinsoon12
  • 25,005
  • 4
  • 25
  • 35
  • what is the benefit of data.table::cube? This is the description `"Calculate aggregates at various levels of groupings producing multiple (sub-)totals. Reflects SQLs GROUPING SETS operations.`. – hhh Jul 18 '19 at 06:30
  • 1
    its actually coming from sql where a lot of aggregations are preprocessed to speed up subsequent analysis. i see that your problem is doing a lot of aggregation and cube came to mind – chinsoon12 Jul 18 '19 at 07:04
  • 1
    I didn't know about cube, more info [here](https://jozef.io/r912-datatable-grouping-sets/). It seems it's useful if you want to replicate the excel pivot tables, add subtotals, etc. – marbel Jul 18 '19 at 15:28
  • 2
    The cube is fantastic, also `groupingsets` is something to look at. If I recall correctly, cube can be implemented with `groupingsets`, hence a little more generic for pivoting in R. +1 for finding this awesome tool. – hhh Jan 14 '20 at 07:20
  • 1
    @hhh you recall correctly, take a look at `data.table:::cube.data.table` – jangorecki Jan 17 '20 at 08:33
1

Hopefully I'm understanding correctly what you intend but please let me know in the comments if you need a different output.

b = data.table(date = rep(seq.Date(Sys.Date()-99, Sys.Date(), "days"), each=2), 
               myCategory = c("a", "b"), 
               revenue = rnorm(100, 200))


# global total, just create a constant
totalRev = b[, sum(revenue)]

# Total revenue at myCategory and date level / total Revenue
b[, Revenue_prop_of_TOT:=sum(revenue)/totalRev, by=.(myCategory, date)]

# you can calculate totalRev_date independently
b[, totalRev_date:=sum(revenue), by=date]

# If these are all the columns you have you don't need the sum(revenue) and by calls
b[, Revenue_prop_of_DAY:=sum(revenue)/totalRev_date, by=.(myCategory, date)]

Finally I would wrap it in a function.

revenue_total <- function(b){ 
  totalRev = b[, sum(revenue)]
  b[, Revenue_prop_of_TOT:=sum(revenue)/totalRev, by=.(myCategory, date)]
  b[, totalRev_date:=sum(revenue), by=date]
  b[, Revenue_prop_of_DAY:=sum(revenue)/totalRev_date, by=.(myCategory, date)]
  b
}

b = revenue_total(b)
marbel
  • 7,560
  • 6
  • 49
  • 68
0

Options for pivoting and subtotals in R

  1. cube answered here

  2. groupingsets commented by marbel here

hhh
  • 50,788
  • 62
  • 179
  • 282