R: Making pivot table with dplyr or reshape2 package

Question

I am trying to make simple pivot table in R using dplyr or reshape2 packages as my dataset is too large and R goes out of memory with sqldf. The two columns of my dataset that I want to make a pivot table out of is "Product" and "Cust_Id". I want to count the number of customer per product. And this is what I got.

library(reshape2)
mydata<-read.table("Book1.txt",header=TRUE,fill=TRUE)
mydata.m<-melt(mydata,id=c("Product"),measured=c(Cust_Id))
mydata.d<-dcast(mydata.m,Product~variable,count)

It returns

Error in UseMethod("group_by_"):
no applicable method for 'group_by_' applied to an object of class "c('integer','numeric')"

I have also tried dplyr with below code(not sure about the last step though as I did it on the other laptop)

library(dplyr)
mydata.df<-tbl_df(mydata)
summarize(mydata.df,Product,Cust_Id=n())

I got no error message but a lot of values seems to be missing in the output. I really appreciate your input. Thanks in advance.

could you dput() part of your data and share an example of the result you're looking for? — mtoto, Jan 04 '16 at 13:29

score 0 · Answer 1 · answered Jan 04 '16 at 13:37

Try this:

library(dplyr)
mydata <- mydata %>%
  group_by(Product) %>%
  summarise(nCustomers = n())

Alternatively, if you only want to count unique customers, you can do:

library(dplyr)
mydata <- mydata %>%
  group_by(Product) %>%
  summarise(nCustomers = n_distinct(Cust_Id))

score 0 · Answer 2 · edited Jan 04 '16 at 21:42

0

If this really is a big data set then your best option in the data.table package

require(data.table)

mydata_data_table = data.table(mydata)

number_customer = mydata_data_table[, .(number_customers = .N), by=Product]

edited Jan 04 '16 at 21:42

alistaire

42,459
4
77
117

answered Jan 04 '16 at 14:23

David

301
1
3
8

R: Making pivot table with dplyr or reshape2 package

2 Answers2