-1

I am trying to make simple pivot table in R using dplyr or reshape2 packages as my dataset is too large and R goes out of memory with sqldf. The two columns of my dataset that I want to make a pivot table out of is "Product" and "Cust_Id". I want to count the number of customer per product. And this is what I got.

library(reshape2)
mydata<-read.table("Book1.txt",header=TRUE,fill=TRUE)
mydata.m<-melt(mydata,id=c("Product"),measured=c(Cust_Id))
mydata.d<-dcast(mydata.m,Product~variable,count)

It returns

Error in UseMethod("group_by_"):
no applicable method for 'group_by_' applied to an object of class "c('integer','numeric')"

I have also tried dplyr with below code(not sure about the last step though as I did it on the other laptop)

library(dplyr)
mydata.df<-tbl_df(mydata)
summarize(mydata.df,Product,Cust_Id=n())  

I got no error message but a lot of values seems to be missing in the output. I really appreciate your input. Thanks in advance.

alistaire
  • 42,459
  • 4
  • 77
  • 117
May Y
  • 179
  • 1
  • 20
  • 1
    could you dput() part of your data and share an example of the result you're looking for? – mtoto Jan 04 '16 at 13:29

2 Answers2

0

Try this:

library(dplyr)
mydata <- mydata %>%
  group_by(Product) %>%
  summarise(nCustomers = n())

Alternatively, if you only want to count unique customers, you can do:

library(dplyr)
mydata <- mydata %>%
  group_by(Product) %>%
  summarise(nCustomers = n_distinct(Cust_Id))
Gopala
  • 10,363
  • 7
  • 45
  • 77
0

If this really is a big data set then your best option in the data.table package

require(data.table)

mydata_data_table = data.table(mydata)

number_customer = mydata_data_table[, .(number_customers = .N), by=Product]
alistaire
  • 42,459
  • 4
  • 77
  • 117
David
  • 301
  • 1
  • 3
  • 8