0

I need to predict probabilities manually. I'm working using this post code. I wanted to delete a variable from a model and keep the original coefficients to predict another period. My formula is:

> lr$formula
target ~ grupoAntig + nu_seguros_1TRUNC + cd_sexo + grupoEdad + 
    vl_limite_aeQU + vl_ltd_6QU + Revolv3 + nu_servicios_1TRUNC + 
    fl_cliente_hit + nu_resumen_6 + fl_rv

I want to delete fl_cliente_hit. So I'm using model.matrix and excluding it:

mm<-model.matrix(~ grupoAntig + nu_seguros_1TRUNC + cd_sexo + grupoEdad + 
    vl_limite_aeQU + vl_ltd_6QU + Revolv3 + nu_servicios_1TRUNC + 
     nu_resumen_6 + fl_rv, train)[,]

So the first line of this matrix is:

> mm[1,]
        (Intercept)       grupoAntigh20       grupoAntigm40 
                  1                   0                   1 
  nu_seguros_1TRUNC            cd_sexoF            cd_sexoM 
                  0                   0                   1 
       grupoEdadh25        grupoEdadm40   vl_limite_aeQU145 
                  0                   1                   0 
        vl_ltd_6QU5             Revolv3 nu_servicios_1TRUNC 
                  0                   0                   0 
       nu_resumen_6              fl_rv1 
                  4                   0 

I guess this should keep (number of levels-1) levels of the variable. For example:

> ddply(train, .(grupoEdad  ), summarize, cant=length(target))
  grupoEdad  cant
1     25a40  7864
2       h25    60
3       m40 11684 

And the matrix only includes 2 of those 3 levels as you can see in mm[1,]

But the problems is for cd_sexo:

> ddply(train, .(cd_sexo), summarize, cant=length(target))
  cd_sexo  cant
1       F  8962
2       M 10646

It only has 2 levels and it's including both.

My problem is that since I want to predict probabilities I will use coeff(lr) and I will have different number of variables compared to mm matrix.

Community
  • 1
  • 1
GabyLP
  • 3,649
  • 7
  • 45
  • 66
  • what does `table(train$cd_sexto, useNA = 'always')` look like? or `with(train, table(cd_sexto, target, useNA = 'always'))` not sure what target is – rawr Nov 11 '15 at 20:49
  • Why do you need to predict probabilities manually instead of using a built-in method? – alexwhitworth Nov 12 '15 at 00:25
  • The basic approach is to do column subsetting on the model matrix via some sort of matching (such as the name) and to similarly do index subsetting on the coefficient vector.... But it's unclear to me why you wouldn't just use the built-in methods. – alexwhitworth Nov 12 '15 at 00:27
  • For example, here's a related question/answer: http://stackoverflow.com/questions/25538199/design-matrix-for-mlm-from-librarylme4-with-fixed-and-random-effects – alexwhitworth Nov 12 '15 at 00:29

0 Answers0