I am trying to estimate relativities for insurance pricing using a glm. I'm using the "freMPTL" in CASdatasets. ClaimNb is my response, Exposure is my Exposure, I'm interested in ClaimNb/Exposure.
After dividing the larger categories such as driver age (18-99) into smaller groups of ex. 5 categories, I grouped the data using
data_grouped_freq <- data_freq4 %>%
group_by(Power, Brand, Gas, Region, CarAge_cat, DriverAge_cat, Density_cat) %>%
summarise(ClaimNb = sum(ClaimNb),
Exposure = sum(Exposure))
after which I use the command
model_freq <- glm(ClaimNb ~ Power + Brand + Gas + Region + CarAge_cat + DriverAge_cat + Density_cat,
family = poisson, data = data_grouped_freq, weights = Exposure)
summary(model_freq)
to plot a glm. The result is then
Deviance Residuals:
Min 1Q Median 3Q Max
-255.241 -2.634 -0.929 -0.202 199.629
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 4.8629082 0.0011698 4156.99 <2e-16 ***
Powerd -0.4660131 0.0014613 -318.90 <2e-16 ***
Powere -0.7155983 0.0013723 -521.44 <2e-16 ***
Powerg -0.4131892 0.0010905 -378.89 <2e-16 ***
...
RegionPoitou-Charentes -2.3903228 0.0052288 -457.14 <2e-16 ***
CarAge_cat1 -1.2547176 0.0021645 -579.68 <2e-16 ***
DriverAge_cat1 -0.7913098 0.0022811 -346.90 <2e-16 ***
DriverAge_cat2 -1.2886084 0.0024688 -521.96 <2e-16 ***
I know that this is wrong because DriverAge_cat1 has a higher ratio of ClaimNb/Exposure and should thus result in a relativity>1, which exp(-18.9082) is not. (The ratio of ClaimNb/Exposure for cat1 is 0.134 compared to 0.071 in the reference group of DriverAge_cat1)
Can someone explain what I am doing wrong? Is it perhaps the fact that there are a lot of categories with 0 Claims causing problems? Maybe i'm treating weights wrong? There are 14661 total cells across 7 variables.