I am collaborating on a project that requires me to use R of which I don't have any experience with to date. I am trying to apply auto arima to partitions/windows within my dataset and I haven't the slightest clue on how to even begin.3
Essentially, I want to train a separate model on each partner_id using the rows c_id = "none" and then forecast/predict values out to the max(date) for each partner_id. The number of months/rows for each partner vary in length. For this example data frame pasted below, partner_id = "1A9" has 12 months/rows with c_id = "none" vs partner_id = "1B9" has 13 months/row with c_id = "none". The number of months/rows extended out to the max(Date) within each partner_is varies as well. This is tricky as I assume I need to dynamically input how many months/rows to train on and how many months/rows to predict on for each partner_id.
I've included a sample dataset below.
x <- data.frame("c_id" = c("none","none","none","none","none",
"none","none","none","none","none","none","none","c-100","c-100","c-100","c-100","c-100","c-100","c-100","c-100","c-100","c-100","c-100","c-100","c-101","c-101","c-101","c-101","c-101","c-101","c-101","c-101","c-101","c-101","c-101","c-101","c-101", "none","none","none","none","none","none","none","none","none","none","none","none","none","c-110","c-110","c-110","c-110","c-110","c-110","c-110","c-110","c-110","c-110","c-110","c-110","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111"), "partner_id" = c("1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9"), "rev_month" = as.Date(c("2016-01-01","2016-01-01","2016-02-01","2016-03-01","2016-04-01","2016-05-01","2016-06-01","2016-07-01","2016-08-01", "2016-09-01","2016-10-01","2016-11-01","2016-12-01","2017-01-01","2017-02-01","2017-03-01","2017-04-01","2017-05-01","2017-06-01","2017-07-01","2017-08-01","2017-09-01","2017-10-01","2017-11-01","2017-12-01","2018-01-01","2018-02-01","2018-03-01","2018-04-01","2018-05-01","2018-06-01","2018-07-01","2018-08-01","2018-09-01","2018-10-01","2018-11-01","2018-12-01", "2017-01-01","2017-01-01","2017-02-01","2017-03-01","2017-04-01","2017-05-01","2017-06-01","2017-07-01","2017-08-01", "2017-09-01","2017-10-01","2017-11-01","2017-12-01","2018-01-01","2018-02-01","2018-03-01","2018-04-01","2018-05-01","2018-06-01","2018-07-01","2018-08-01","2018-09-01","2018-10-01","2018-11-01","2018-12-01","2019-01-01","2019-02-01","2019-03-01","2019-04-01","2019-05-01","2019-06-01","2019-07-01","2019-08-01","2019-09-01","2019-10-01","2019-11-01","2019-12-01", "2020-01-01", "2020-02-01", "2020-03-01")), "rev" = c(101.25, 102.25, 103.50, 103.75, 104.15, 104.25, 104.3, 105.00, 105.20, 105.60, 106.00, 106.10, 106.50, 101.50, 100.30, 107.50, 108.30, 108.45, 109.10, 110.10, 112.15, 112.45, 114.65, 115.00, 116.00, 116.50, 117.25, 117.85, 119.25, 119.95, 120.20, 121.50, 122.30, 122.40, 123.25, 123.75, 124.00, 101.25, 102.25, 103.50, 103.75, 104.15, 104.25, 104.3, 105.00, 105.20, 105.60, 106.00, 106.10, 106.50, 101.50, 100.30, 107.50, 108.30, 108.45, 109.10, 110.10, 112.15, 112.45, 114.65, 115.00, 116.00, 116.50, 117.25, 117.85, 119.25, 119.95, 120.20, 121.50, 122.30, 122.40, 123.25, 123.75, 124.00, 124.10, 125.35, 125.45), stingsAsFactors=FALSE)
My apologies for not having any code starter code yet as I am still trying to think about this conceptually while not having much experience with R at all. Ultimately, I'd like to add the column of predictions and confidence intervals back to my original dataframe. I'd be open to any R and/or Python Solutions.