0

I have a dataset of 25544 observations and 7 explanatory variables, that I split in train set and test set. Then I run a GAMGam model with BSplines on the train set.

y = dfop[['RATIO_OPENING']]
X = dfop.loc[:, ~dfop.columns.isin(['MED_RATIO_OPENING','RATIO_OPENING','OD_UNDIR_CITY_PAIR','MONTH'])]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
x_spline = X_train[['DISTANCE', 'CITY_POP_A','CITY_POP_B','A_GDP_PPP_1990_2015_5arcmin','A_HDI_1990_2015','B_GDP_PPP_1990_2015_5arcmin','B_HDI_1990_2015']]
bs = BSplines(x_spline, df=[3,3,3,3,3,3,3], degree=[2,2,2,2,2,2,2])
poisson = GLMGam(y_train, x_spline, smoother=bs, family=sm.families.Poisson())
poisson_fit = poisson.fit()

I want to predict the dependant variable on the test set.

X_test = X_test[['DISTANCE', 'CITY_POP_A','CITY_POP_B','A_GDP_PPP_1990_2015_5arcmin','A_HDI_1990_2015','B_GDP_PPP_1990_2015_5arcmin','B_HDI_1990_2015']]
results = poisson_fit.predict(exog=X_test, transform=True)

The last line returns the following error.

ValueError: shapes (6386,7) and (21,) not aligned: 7 (dim 1) != 21 (dim 0)

What is the correct syntax for the prediction?

Lucas Snow
  • 35
  • 4
  • I think it's `exog_smooth=X_test` https://www.statsmodels.org/dev/generated/statsmodels.gam.generalized_additive_model.GLMGamResults.predict.html – Josef Jul 22 '20 at 00:32
  • If you use `x_spline` in the linear part and in the spline, then you have perfect collinearity, which should still be possible if the spline part is penalized enough. (But I don't remember the details.) – Josef Jul 22 '20 at 00:36

0 Answers0