My goal is to #1 predict sales for the 15th and the 16th by ItemNumber and store using the below data. #2 show that prediction as a column in this data or a subset of it so I can look at a row and see the predictor variables next to the prediction for each day.
Data.table is what I am most comfortable with. I would like to use it to achieve my goals if possible.
Below, DT is a table of historical sales for two items in two retail store locations and variables that impacted those sales through 14 July. Included in DT are two "future days" the 15th and 16th of July along with some known variables that will impact sales on those days.
https://stackoverflow.com/questions/23947245/use-predict-on-data-table-with-linear-regression
Using the above post among others and advice from user 42 I can get as far as the next line. D1 is historical UnitSales and D2 is the future days I want to predict UnitSales for.
resultsofpredict=D1[,predict.lm(lm(UnitSales~.,.SD,na.action = na.exclude),D2[.BY]),by=c("ItemNumber","Store"),.SDcols=testcols]
Now, How do I properly combine or merge resultsofpredict with D2 so I can see the prediction, date, and all predictor variables together on the same row--the prediction V1 as column of D2? Here are attempts that do not work and end up with 16 rows when I only wanted 8. Can someone please post an example using my data so I can visualize it as I am new to this?
combinedresult=merge(D2,resultsofpredict )
combinedresult=merge(resultsofpredict,D2)
Thank you.
library(data.table)
library(lubridate)
zz="
Store Date ItemNumber RetailPrice Backstock UnitSales typea typeb typec Sunday Monday Tuesday Wednesday Friday Saturday
1 7/8/2018 500 1.2 5 20 0 1 0 1 0 0 0 0 0
1 7/9/2018 500 1.2 9 10 0 1 0 0 1 0 0 0 0
1 7/10/2018 500 0.8 0 10 1 0 0 0 0 1 0 0 0
1 7/11/2018 500 0.8 0 8 1 0 0 0 0 0 1 0 0
1 7/12/2018 500 0.8 0 7 1 0 0 0 0 0 0 0 0
1 7/13/2018 500 1.2 3 18 0 1 0 0 0 0 0 1 0
1 7/14/2018 500 1.2 0 21 0 1 0 0 0 0 0 0 1
1 7/15/2018 500 1.2 3 99 0 1 0 1 0 0 0 0 0
1 7/16/2018 500 0.8 0 99 1 0 0 0 1 0 0 0 0
1 7/8/2018 600 1.2 0 18 0 1 0 1 0 0 0 0 0
1 7/9/2018 600 1.2 0 11 0 1 0 0 1 0 0 0 0
1 7/10/2018 600 0.8 0 12 1 0 0 0 0 1 0 0 0
1 7/11/2018 600 0.8 0 4 1 0 0 0 0 0 1 0 0
1 7/12/2018 600 0.8 0 5 1 0 0 0 0 0 0 0 0
1 7/13/2018 600 1.2 0 13 0 1 0 0 0 0 0 1 0
1 7/14/2018 600 1.2 0 29 0 1 0 0 0 0 0 0 1
1 7/15/2018 600 1.2 2 99 0 1 0 1 0 0 0 0 0
1 7/16/2018 600 0.8 0 99 1 0 0 0 1 0 0 0 0
2 7/8/2018 500 1.2 0 10 0 1 0 1 0 0 0 0 0
2 7/9/2018 500 1.2 0 6 0 1 0 0 1 0 0 0 0
2 7/10/2018 500 0.8 0 5 1 0 0 0 0 1 0 0 0
2 7/11/2018 500 0.8 0 5 1 0 0 0 0 0 1 0 0
2 7/12/2018 500 0.8 3 5 1 0 0 0 0 0 0 0 0
2 7/13/2018 500 1.2 3 12 0 1 0 0 0 0 0 1 0
2 7/14/2018 500 1.2 9 14 0 1 0 0 0 0 0 0 1
2 7/15/2018 500 1.2 3 99 0 1 0 1 0 0 0 0 0
2 7/16/2018 500 0.8 0 99 1 0 0 0 1 0 0 0 0
2 7/8/2018 600 1.2 0 14 0 1 0 1 0 0 0 0 0
2 7/9/2018 600 1.2 0 11 0 1 0 0 1 0 0 0 0
2 7/10/2018 600 0.8 0 7 1 0 0 0 0 1 0 0 0
2 7/11/2018 600 0.8 0 3 1 0 0 0 0 0 1 0 0
2 7/12/2018 600 0.8 0 5 1 0 0 0 0 0 0 0 0
2 7/13/2018 600 1.2 0 12 0 1 0 0 0 0 0 1 0
2 7/14/2018 600 1.2 0 17 0 1 0 0 0 0 0 0 1
2 7/15/2018 600 1.2 0 99 0 1 0 1 0 0 0 0 0
2 7/16/2018 600 0.8 0 99 1 0 0 0 1 0 0 0 0
"
load and prepare data
DT=as.data.table(read.table(text=zz,fill=TRUE,header=TRUE))
setkey(DT,ItemNumber,Store)
I used 99 only as a place holder in the table to load an example. Removing it here.
DT[UnitSales==99,UnitSales:=(value=NA_integer_)]
DT = DT[, Date:= as.Date(Date, "%m/%d/%Y")]
variables used to predict UnitSales. The actual data has numerous columns I want to keep but that do not help predict UnitSales so these are the names of the subset of columns I will use to predict.
testcols=c('RetailPrice','Saturday','Sunday','Tuesday')
Splitting DT into history and new data
d="07/14/2018"
d=as.Date(d,"%m/%d/%Y")
Historical sales
D1=DT[Date < d+1]
"Future" days to predict UnitSales for
D2=DT[Date> d]
table showing coefficients from D1
coeftable=D1[ , as.list(coef(lm(UnitSales~.,.SD,na.action = NULL))) ,
by= c("ItemNumber","Store"),.SDcols=testcols]