0

I'm using auto_arima via pmdarima to fit multiple time series via a groupby. This is to say, I have a pd.DataFrame of stacked time-indexed data, grouped by variable variable, and have successfully applied transform(pm.auto_arima) to each. The reproducible example finds boring best ARIMA models, but the idea seems to work. I now want to apply .predict() similarly, but cannot get it to play nice with apply / lambda(x) / their combinations.

The code below works until the # Forecasting - help! section. I'm having trouble catching the correct object (apparently) in the apply. How might I adapt one of test1, test2, or test3 to get what I want? Or, is there some other best-practice construct to consider? Is it better across columns (without a melt)? Or via a loop?

Ultimately, I hope that test1, say, is a stacked pd.DataFrame (or pd.Series at least) with 8 rows: 4 forecasted values for each of the 2 time series in this example, with an identifier column variable (possibly tacked on after the fact).

import pandas as pd 
import pmdarima as pm
import itertools

# Get data - this is OK. 
url = 'https://raw.githubusercontent.com/nickdcox/learn-airline-delays/main/delays_2018.csv'
keep = ['arr_flights', 'arr_cancelled']

# Setup data - this is OK. 
df = pd.read_csv(url, index_col=0)
df.index = pd.to_datetime(df.index, format = "%Y-%m")
df = df[keep]
df = df.sort_index()
df = df.loc['2018']
df = df.groupby(df.index).sum()
df.reset_index(inplace = True)
df = df.melt(id_vars = 'date', value_vars = df.columns.to_list()[1:])

# Fit auto.arima for each time series - this is OK. 
fit = df.groupby('variable')['value'].transform(pm.auto_arima).drop_duplicates() 
fit = fit.to_frame(name = 'model') 
fit['variable'] = keep
fit.reset_index(drop = True, inplace = True)

# Setup forecasts - this is OK. 
max_date = df.date.max()
dr = pd.to_datetime(pd.date_range(max_date, periods = 4 + 1, freq = 'MS').tolist()[1:])
yhat = pd.DataFrame(list(itertools.product(keep, dr)), columns = ['variable', 'date']) 
yhat.set_index('date', inplace = True)

# Forecasting - help! - Can't get any of these to work.
def predict_fn(obj): 
  return(obj.loc[0].predict(4))
predict_fn(fit.loc[fit['variable'] == 'arr_flights']['model'])                             # Appears to work! 

test1 = fit.groupby('variable')['model'].apply(lambda x: x.predict(n_periods = 4))         # Try 1: 'Series' object has no attribute 'predict'.
test2 = fit.groupby('variable')['model'].apply(lambda x: x.loc[0].predict(n_periods = 4))  # Try 2: KeyError
test3 = fit.groupby('variable')['model'].apply(predict_fn)                                 # Try 3: KeyError
Nick ODell
  • 15,465
  • 3
  • 32
  • 66
jasmyace
  • 101
  • 1
  • 7
  • BTW, this code swaps the models for flights and cancellations. The output of a groupby is sorted, and `keep = ['arr_flights', 'arr_cancelled']` is not in the same order as the groupby. – Nick ODell Sep 24 '22 at 18:45
  • Could be. I typically work-check time-series stuff with a plot of data with forecasts. Since I lack forecasts (so far), I didn't complete the process. – jasmyace Sep 24 '22 at 19:23

0 Answers0