using lm() and predict() inside data.table

Question

My goal is to #1 predict sales for the 15th and the 16th by ItemNumber and store using the below data. #2 show that prediction as a column in this data or a subset of it so I can look at a row and see the predictor variables next to the prediction for each day.

Data.table is what I am most comfortable with. I would like to use it to achieve my goals if possible.

Below, DT is a table of historical sales for two items in two retail store locations and variables that impacted those sales through 14 July. Included in DT are two "future days" the 15th and 16th of July along with some known variables that will impact sales on those days.

https://stackoverflow.com/questions/23947245/use-predict-on-data-table-with-linear-regression

Using the above post among others and advice from user 42 I can get as far as the next line. D1 is historical UnitSales and D2 is the future days I want to predict UnitSales for.

resultsofpredict=D1[,predict.lm(lm(UnitSales~.,.SD,na.action = na.exclude),D2[.BY]),by=c("ItemNumber","Store"),.SDcols=testcols]

Now, How do I properly combine or merge resultsofpredict with D2 so I can see the prediction, date, and all predictor variables together on the same row--the prediction V1 as column of D2? Here are attempts that do not work and end up with 16 rows when I only wanted 8. Can someone please post an example using my data so I can visualize it as I am new to this?

combinedresult=merge(D2,resultsofpredict )

combinedresult=merge(resultsofpredict,D2)

Thank you.

library(data.table)
library(lubridate)

zz="
Store   Date    ItemNumber  RetailPrice Backstock   UnitSales   typea   typeb   typec   Sunday  Monday  Tuesday Wednesday   Friday  Saturday
1   7/8/2018    500 1.2 5   20  0   1   0   1   0   0   0   0   0
1   7/9/2018    500 1.2 9   10  0   1   0   0   1   0   0   0   0
1   7/10/2018   500 0.8 0   10  1   0   0   0   0   1   0   0   0
1   7/11/2018   500 0.8 0   8   1   0   0   0   0   0   1   0   0
1   7/12/2018   500 0.8 0   7   1   0   0   0   0   0   0   0   0
1   7/13/2018   500 1.2 3   18  0   1   0   0   0   0   0   1   0
1   7/14/2018   500 1.2 0   21  0   1   0   0   0   0   0   0   1
1   7/15/2018   500 1.2 3   99  0   1   0   1   0   0   0   0   0
1   7/16/2018   500 0.8 0   99  1   0   0   0   1   0   0   0   0
1   7/8/2018    600 1.2 0   18  0   1   0   1   0   0   0   0   0
1   7/9/2018    600 1.2 0   11  0   1   0   0   1   0   0   0   0
1   7/10/2018   600 0.8 0   12  1   0   0   0   0   1   0   0   0
1   7/11/2018   600 0.8 0   4   1   0   0   0   0   0   1   0   0
1   7/12/2018   600 0.8 0   5   1   0   0   0   0   0   0   0   0
1   7/13/2018   600 1.2 0   13  0   1   0   0   0   0   0   1   0
1   7/14/2018   600 1.2 0   29  0   1   0   0   0   0   0   0   1
1   7/15/2018   600 1.2 2   99  0   1   0   1   0   0   0   0   0
1   7/16/2018   600 0.8 0   99  1   0   0   0   1   0   0   0   0
2   7/8/2018    500 1.2 0   10  0   1   0   1   0   0   0   0   0
2   7/9/2018    500 1.2 0   6   0   1   0   0   1   0   0   0   0
2   7/10/2018   500 0.8 0   5   1   0   0   0   0   1   0   0   0
2   7/11/2018   500 0.8 0   5   1   0   0   0   0   0   1   0   0
2   7/12/2018   500 0.8 3   5   1   0   0   0   0   0   0   0   0
2   7/13/2018   500 1.2 3   12  0   1   0   0   0   0   0   1   0
2   7/14/2018   500 1.2 9   14  0   1   0   0   0   0   0   0   1
2   7/15/2018   500 1.2 3   99  0   1   0   1   0   0   0   0   0
2   7/16/2018   500 0.8 0   99  1   0   0   0   1   0   0   0   0
2   7/8/2018    600 1.2 0   14  0   1   0   1   0   0   0   0   0
2   7/9/2018    600 1.2 0   11  0   1   0   0   1   0   0   0   0
2   7/10/2018   600 0.8 0   7   1   0   0   0   0   1   0   0   0
2   7/11/2018   600 0.8 0   3   1   0   0   0   0   0   1   0   0
2   7/12/2018   600 0.8 0   5   1   0   0   0   0   0   0   0   0
2   7/13/2018   600 1.2 0   12  0   1   0   0   0   0   0   1   0
2   7/14/2018   600 1.2 0   17  0   1   0   0   0   0   0   0   1
2   7/15/2018   600 1.2 0   99  0   1   0   1   0   0   0   0   0
2   7/16/2018   600 0.8 0   99  1   0   0   0   1   0   0   0   0
"

load and prepare data

DT=as.data.table(read.table(text=zz,fill=TRUE,header=TRUE))
setkey(DT,ItemNumber,Store)

I used 99 only as a place holder in the table to load an example. Removing it here.

DT[UnitSales==99,UnitSales:=(value=NA_integer_)]
DT = DT[, Date:=  as.Date(Date, "%m/%d/%Y")]

variables used to predict UnitSales. The actual data has numerous columns I want to keep but that do not help predict UnitSales so these are the names of the subset of columns I will use to predict.

testcols=c('RetailPrice','Saturday','Sunday','Tuesday')

Splitting DT into history and new data

d="07/14/2018"
d=as.Date(d,"%m/%d/%Y")

Historical sales

D1=DT[Date < d+1]

"Future" days to predict UnitSales for

D2=DT[Date> d]

table showing coefficients from D1

coeftable=D1[ , as.list(coef(lm(UnitSales~.,.SD,na.action = NULL))) , 
               by= c("ItemNumber","Store"),.SDcols=testcols]

The error message seems fairly clear. You have two "by" arguments. — IRTFM, Aug 31 '18 at 02:15

using lm() and predict() inside data.table

0 Answers0