How to aggregate big data?

Question

I have a big data set of sales data like this:

  Ordernumber     Category   Sold_Items
1 123             A          2
2 123             B          1
3 234             C          1
4 345             D          1
5 456             A          2
6 456             B          1

And I want to aggregate it to this:

  A   B   C   D     frequency
  2   1             2
          1         1
              1     1

So, basically I want to have one row for every category-combination which exists in my sales data. And I also want to know how frequent one combination is.

Well, I tried it with cast and melt and they get me there, but only when the data set is small enough. Unfortunately, I have like >3 million rows of data which cast and melt cannot handle any more.

Can anyone tell me how to aggregate my data in a fast way?

Thank you in advance!

score 3 · Accepted Answer · answered May 15 '15 at 07:53

3

You may try

library(data.table)#v1.9.5+
dcast(setDT(df1), Ordernumber~Category, value.var='Sold_Items')[,
        frequency:=do.call(pmax, c(.SD, na.rm=TRUE)), .SDcols=2:5]

answered May 15 '15 at 07:53

akrun

874,273
37
540
662

How to aggregate big data?

1 Answers1