I'm trying to build a predictive model with a customer database.
I have a dataset with 3,000 customers. Each customers have 300 observations and 20 variables (including dependent variable) in a test dataset. I also have a score dataset that has 50 observation with 19 variables (excludes dependent variable) for each unique cutomer ID. I have the test dataset in a separate file with each customer identified by a unique ID variable similarly the score dataset is identified by a unique id variable.
I'm developing a RandomForest based predictive model. Below is the sample for a single customer. I'm not sure how I could automatically apply to the model for each customer and predict and store the model effeciently as well.
install.packages(randomForest)
library(randomForest)
sales <- read.csv("C:/rdata/test.csv", header=T)
sales_score <- read.csv("C:/rdata/score.csv", header=T)
## RandomForest for Single customer
sales.rf <- randomForest(Sales ~ ., ntree = 500, data = sales,importance=TRUE)
sales.rf.test <- predict(sales.rf, sales_score)
I have very good familiarity with SAS and beginning to learn R. For SAS progremmers, there are many SAS procedures that come with by group processing for example:
proc gam data = test;
by id;
model y = x1 x2 x3;
score data = test out = pred;
run;
This SAS program would develop a gam model for each unique iD and apply them to the test set for each unique ID. Is there an R equivalent ?
I would greatly appreciate any example or thoughts?
Thanks so much