I'm conducting a case study where I have to predict claim number per policy. Since my variable ClaimNb is not binary I can't use logistic Regression but I have to use Poisson. My code for GLM model:
import statsmodels.api as sm
import statsmodels.formula.api as smf
formula= 'ClaimNb ~ BonusMalus+VehAge+Freq+VehGas+Exposure+VehPower+Density+DrivAge'
model = smf.glm(formula = formula, data=df,
family=sm.families.Poisson())
I have also split my data
# train-test-split
train , test = train_test_split(data,test_size=0.2,random_state=0)
# seperate the target and independent variable
train_x = train.drop(columns=['ClaimNb'],axis=1)
train_y = train['ClaimNb']
test_x = test.drop(columns=['ClaimNb'],axis=1)
test_y = test['ClaimNb']
My problem now is the prediction, I have used the following but did not work:
from sklearn.linear_model import PoissonRegressor model = PoissonRegressor(alpha=1e-3, max_iter=1000)
model.fit(train_x,train_y)
predict = model.predict(test_x)
Please is there any other way to predict and check the accuracy of the model?
thanks