I need to predict probabilities manually. I'm working using this post code. I wanted to delete a variable from a model and keep the original coefficients to predict another period. My formula is:
> lr$formula
target ~ grupoAntig + nu_seguros_1TRUNC + cd_sexo + grupoEdad +
vl_limite_aeQU + vl_ltd_6QU + Revolv3 + nu_servicios_1TRUNC +
fl_cliente_hit + nu_resumen_6 + fl_rv
I want to delete fl_cliente_hit. So I'm using model.matrix and excluding it:
mm<-model.matrix(~ grupoAntig + nu_seguros_1TRUNC + cd_sexo + grupoEdad +
vl_limite_aeQU + vl_ltd_6QU + Revolv3 + nu_servicios_1TRUNC +
nu_resumen_6 + fl_rv, train)[,]
So the first line of this matrix is:
> mm[1,]
(Intercept) grupoAntigh20 grupoAntigm40
1 0 1
nu_seguros_1TRUNC cd_sexoF cd_sexoM
0 0 1
grupoEdadh25 grupoEdadm40 vl_limite_aeQU145
0 1 0
vl_ltd_6QU5 Revolv3 nu_servicios_1TRUNC
0 0 0
nu_resumen_6 fl_rv1
4 0
I guess this should keep (number of levels-1) levels of the variable. For example:
> ddply(train, .(grupoEdad ), summarize, cant=length(target))
grupoEdad cant
1 25a40 7864
2 h25 60
3 m40 11684
And the matrix only includes 2 of those 3 levels as you can see in mm[1,]
But the problems is for cd_sexo:
> ddply(train, .(cd_sexo), summarize, cant=length(target))
cd_sexo cant
1 F 8962
2 M 10646
It only has 2 levels and it's including both.
My problem is that since I want to predict probabilities I will use coeff(lr) and I will have different number of variables compared to mm matrix.