first post, so go easy.
In the insurance world of GLMing, the classic approach is to model claims frequency and average severity. With that in mind, I built a couple of models to experiment for myself and now have a question.
Could somebody please explain how GLM handles varying levels of summarisation of a dataset, particularly with regard to error estimates?
Consider the example below. The data exhibits strong severity trends for both variables: - A has more expensive claims than B - Ford > Kia > Vaux > Jag
I fitted a model to unsummarised and a summarised version of the dataset, and accordingly GLM fitted the same parameters in both cases
However, GLM indicates a well fitted model to the unsummarised data. But when I summarise and use a weighted mean, ie average severity, the model fits poorly. Maybe this is as you would expect, after all the unsummarised data has more points to model with. Also, it appears the weighted mean is used to indicate RELATIVE strength, so here, specifiying the weighted mean is pointless, since they are all the same weights.
But more fundementally, can I not model average severity with GLM? I mean, I know the result of fitting a GLM to an unsummarised dataset will be a average severity, but I was hoping to fit a model to already summarised data. It appears that modelling on aggregated datasets will not give a true indication of the model fit.
Apologies if this a stupid question, I'm not a statistician, so don't fully understand the Hessian Matrix.
Please see code below:
library(boot)
library(reshape)
dataset <- data.frame(
Person = rep(c("A", "B"), each=200),
Car = rep(c("Ford", "Kia", "Vaux", "Jag"), 2, each=50),
Amount = c(rgamma(50, 200), rgamma(50, 180), rgamma(50, 160), rgamma(50, 140),
rgamma(50, 100), rgamma(50, 80), rgamma(50, 60), rgamma(50, 40))
)
Agg1 <- ddply(dataset, .(Person, Car), summarise, mean=mean(Amount), length=length(Amount))
m1 <- glm(Amount ~ Person + Car, data = dataset, family = Gamma(link="log"))
m2 <- glm(mean ~ Person + Car, data = Agg1, family = Gamma(link="log"), weights=length)
summary(m1)
summary(m2)
Thanks,
Nick