6

prophet users of the world, hope all is well. I'm having some difficulties with a particular use case that I'll try to illustrate using some sample data and code below. First let's generate some sample data so that it will be a little bit easier to know what I am talking about.

library(data.table)
library(prophet)
library(dplyr)

# one year of months to be used for generating predictions
ds = c('2016-01-01', '2016-02-01','2016-03-01','2016-04-01','2016-05-01','2016-06-01','2016-07-01','2016-08-01','2016-09-01','2016-10-01','2016-11-01','2016-12-01' )

# historical customer counts
y = c (78498,12356,93732,5556,410,10296,9779,744,16407,100484,23954,141398,10575,850,16334,17496,1643,28074,93181,
       18770,129968,11590,850,16738,17510,1376,27931,94369,18444,134850,13386,919,19075,18050,1565,31296,112094,27995,
       167094,13402,1422,22766,20072,2340,37863,87346,16180,119863,7691,725,16931,12163,1241,25872,87455,16322,116390,
       6994,620,13524,11059,990,22188,105473,23652,154145,13520,1008,18857,19209,1632,31105,102252,21284,138779,11670,
       918,16078,16679,1257,26755,115033,22415,139835,13965,936,18027,18642,1407,28622,155371,40556,174321,25119,1859,
       35326,28844,2962,51582,108817,19158,109864,8693,756,14358,13390,1091,21419)

# the segment channels of the customers
segment_channel = c('Existing_Omni', 'Existing_Retail', 'Existing_Direct', 'NTB_Omni', 'NTB_Retail', 'NTB_Direct', 'React_Omni', 'React_Retail', 'React_Direct')

# an external regressor to be added to the model (in my data there are like 40 of these regressor variables that I would like too add)
flash_sale = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2,
               2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3)

fake_data = merge(ds,segment_channel, all.y=TRUE)
setnames(fake_data, 'x', 'ds')
setnames(fake_data, 'y', 'segment_channel')
nrow(fake_data) # should be 108 rows, the 9 customer segements for each of the months in 2016

# next join the known customer counts, let's say we have them for the first 8 months of the year

fake_data = cbind(fake_data, y)
fake_data = cbind(fake_data, flash_sale)

# set some of the y values to NA so we can pretend we are trying to predict them using the ds time series as well as the flash sale values,
# which will be known in advance

fake_data = as.data.table(fake_data)
fake_data$ds = as.Date(fake_data$ds)
fake_data[, y := ifelse(ds >= '2016-08-01', NA, y)]

This code will generate a data set fairly similar to what I am working with for my problem, so hopefully you may be able to reproduce what I am doing. There are essentially two things I would like to be able to do with this data. The first is fairly straight forward, I want to be able to obviously add a regressor (like flash_sale in this example to the prophet model that I create. I can do this fairly easily like so:

christ <- tibble(
  holiday = 'christ',
  ds = as.Date(c('2016-11-01', '2017-11-01', '2018-11-01',
                 '2019-11-01')),
  lower_window = 0,
  upper_window = 1
)

nye <- tibble(
  holiday = 'nye',
  ds = as.Date(c('2016-11-01', '2017-12-01', '2018-11-01',
                 '2019-11-01')),
  lower_window = 0,
  upper_window = 1
)

holidays <- bind_rows(nye, christ)

m <- prophet(holidays = holidays)
m<- add_regressor(m, name = "flash_sale")
m <- fit.prophet(m, fake_data)
forecast <- predict(m, fake_data)


prophet_plot_components(m, forecast)

This should generate a fairly ugly plot but it's pretty easy to see that given the data this should be able to do the trick, and I could add multiple lines to add additional regressors. Ok, so we're all good so far. But the other issue is that I have 9 segment channels that I'm dealing with, and I don't want to build a separate model for each of them. Luckily I found a pretty good link on stack overflow that accomplishes the grouped prophet prediction: Using Prophet Package to Predict By Group in Dataframe in R

fcst = fake_data %>%  
  group_by(segment_channel) %>%
  do(predict(prophet(., seasonality.mode = 'multiplicative', holidays = holidays, seasonality.prior.scale = 10, changepoint.prior.scale = .034), make_future_dataframe(prophet(.), periods = 11, freq='month'))) %>% 
  dplyr::select(ds, segment_channel, yhat)

fcst
> fcst
# A tibble: 207 x 3
# Groups:   segment_channel [9]
   ds                  segment_channel   yhat
   <dttm>              <fct>            <dbl>
 1 2016-01-01 00:00:00 Existing_Direct 38712.
 2 2016-02-01 00:00:00 Existing_Direct 40321.
 3 2016-03-01 00:00:00 Existing_Direct 42648.
 4 2016-04-01 00:00:00 Existing_Direct 45130.
 5 2016-05-01 00:00:00 Existing_Direct 46580.
 6 2016-06-01 00:00:00 Existing_Direct 49437.
 7 2016-07-01 00:00:00 Existing_Direct 50651.
 8 2016-08-01 00:00:00 Existing_Direct 52685.
 9 2016-09-01 00:00:00 Existing_Direct 54719.
10 2016-10-01 00:00:00 Existing_Direct 56687.
# ... with 197 more rows

This is more or less exactly what I want! Cool. So now all I have to do is figure out how to get my grouped predictions and my regressors added all in one step. I know I can have multi-line statements inside of do, so this is what I tried in order to get this to work:

> fcst = fake_data %>%  
+   group_by(segment_channel) %>%
+   do(
+     predict(prophet(., seasonality.mode = 'multiplicative', holidays = holidays, seasonality.prior.scale = 10, changepoint.prior.scale = .034), 
+     add_regressor(prophet(., holidays = holidays), name = 'flash_sale'),
+     fit.prophet(prophet(. , holidays = holidays)),
+     make_future_dataframe(prophet(.), periods = 11, freq='month'))) %>% 
+   dplyr::select(ds, segment_channel, yhat)
Disabling yearly seasonality. Run prophet with yearly.seasonality=TRUE to override this.
Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
n.changepoints greater than number of observations. Using 4
Disabling yearly seasonality. Run prophet with yearly.seasonality=TRUE to override this.
Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
n.changepoints greater than number of observations. Using 4
Error in add_regressor(prophet(., holidays = holidays), name = "flash_sale") : 
  Regressors must be added prior to model fitting.

Darn. Looks like it was running but then something about how I tried to add the regressor wasn't kosher. Next I it tried this way:

> fcst = fake_data %>%  
+   group_by(segment_channel) %>%
+   do(
+     prophet(holidays = holidays),
+     add_regressor(prophet(., holidays = holidays), name = 'flash_sale'),
+     fit.prophet(prophet(. , holidays = holidays)),
+     predict(prophet(., seasonality.mode = 'multiplicative', holidays = holidays, seasonality.prior.scale = 10, changepoint.prior.scale = .034),
+     make_future_dataframe(prophet(.), periods = 11, freq='month'))) %>% 
+   dplyr::select(ds, segment_channel, yhat)
Error: Can only supply one unnamed argument, not 4
Call `rlang::last_error()` to see a backtrace
> fcst = fake_data %>%  
+   group_by(segment_channel) %>%
+   do(
+     add_regressor(prophet(., holidays = holidays), name = 'flash_sale'),
+     fit.prophet(prophet(. , holidays = holidays)),
+     predict(prophet(., seasonality.mode = 'multiplicative', holidays = holidays, seasonality.prior.scale = 10, changepoint.prior.scale = .034),
+     make_future_dataframe(prophet(.), periods = 11, freq='month'))) %>% 
+   dplyr::select(ds, segment_channel, yhat)
Error: Can only supply one unnamed argument, not 3
Call `rlang::last_error()` to see a backtrace

I'm super confused at this point so I'm just hoping something out on the interwebs might know just the right incantation I need to get where I'm going.

jane_thompson
  • 61
  • 1
  • 5

0 Answers0