1

I have a big data set of sales data like this:

  Ordernumber     Category   Sold_Items
1 123             A          2
2 123             B          1
3 234             C          1
4 345             D          1
5 456             A          2
6 456             B          1

And I want to aggregate it to this:

  A   B   C   D     frequency
  2   1             2
          1         1
              1     1 

So, basically I want to have one row for every category-combination which exists in my sales data. And I also want to know how frequent one combination is.

Well, I tried it with cast and melt and they get me there, but only when the data set is small enough. Unfortunately, I have like >3 million rows of data which cast and melt cannot handle any more.

Can anyone tell me how to aggregate my data in a fast way?

Thank you in advance!

Siruphuhn
  • 111
  • 1
  • 8

1 Answers1

3

You may try

library(data.table)#v1.9.5+
dcast(setDT(df1), Ordernumber~Category, value.var='Sold_Items')[,
        frequency:=do.call(pmax, c(.SD, na.rm=TRUE)), .SDcols=2:5]
akrun
  • 874,273
  • 37
  • 540
  • 662