1

I want to run a simple multivariate logistic regression. I made an example below with binary data to talk through an example.

multivariate regression = trying to predict 2+ outcome variables

> y = matrix(c(0,0,0,1,1,1,1,1,1,0,0,0), nrow=6,ncol=2)

> x = matrix(c(1,0,0,0,0,0,1,1,0,0,0,0,1,1,1,0,0,0,1,1,1,1,0,0,1,1,1,1,1,0,1,1,1,1,1,1), nrow=6,ncol=6)
> x
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    1    1    1    1    1
[2,]    0    1    1    1    1    1
[3,]    0    0    1    1    1    1
[4,]    0    0    0    1    1    1
[5,]    0    0    0    0    1    1
[6,]    0    0    0    0    0    1
> y
     [,1] [,2]
[1,]    0    1
[2,]    0    1
[3,]    0    1
[4,]    1    0
[5,]    1    0
[6,]    1    0

So, variable "x" has 6 samples and each sample has 6 attributes. Variable "y" has 2 predictions for each of the 6 samples. I specifically want to work with binary data.

> fit = glm(y~x-1, family = binomial(logit))

I do "-1" to eliminate the intercept coefficient. Everything else is standard logistic regression in a multivariate situation.

> fit

Call:  glm(formula = y ~ x - 1, family = binomial(logit))

Coefficients:
 data1   data2   data3   data4   data5   data6  
  0.00    0.00  -49.13    0.00    0.00   24.57  

Degrees of Freedom: 6 Total (i.e. Null);  0 Residual
Null Deviance:      8.318 
Residual Deviance: 2.572e-10    AIC: 12

At this point things are starting to look off. I am not sure why the internet for data 3 and 6 is what it is.

val <- predict(fit,data.frame(c(1,1,1,1,1,1)), type = "response")

> val
       1            2            3            4            5            6 
2.143345e-11 2.143345e-11 2.143345e-11 1.000000e+00 1.000000e+00 1.000000e+00 

Logically I am doing something wrong. I am expecting a 1x2 matrix , not 1x6. I want matrix that tells me the probability of data frame vector being a "1"(true) in y1 and y2.

Any help would be appreciated.

Note : I updated the ending of my question based on reply from Mario.

logic8
  • 113
  • 1
  • 1
  • 5

2 Answers2

1

Unlike lm, glm does not work with multivariate response variables. As a workaround, you can fit several GLMs:

fit1 <- glm(y[,1] ~ x-1, family=binomial(logit))
fit2 <- glm(y[,2] ~ x-1, family=binomial(logit))

Or you can use glmer from the lme4 package, which is meant to model mixed models, but you can simply omit "random effects". AFAIK, glmer supports multivariate responses.

cdalitz
  • 1,019
  • 1
  • 6
  • 7
0

The argument newdata need to be a data.frame. You can do this:

aux <- data.frame(c(1,1,1,1,1,1))
val <- predict(fit, aux, type = "response")
Mario
  • 5
  • 1
  • First, you are right that I need to input a data frame type object into the "predict" function. Secondly, the output value of "val" is a 1x6 matrix. I am trying to get a 1x2 matrix that tells me the probability of vector [1,1,1,1,1,1] being a "1"(true) in y1 and y2. – logic8 Mar 09 '17 at 17:21
  • It Is possible to make a model which target will be y1 and other which target will be y2 and concate the outputs? – Mario Mar 10 '17 at 08:49
  • I see what you are saying. I think that there is some sort of correlation factor between the dependent variable that multivariate logistic regression would find, and concatenating output would have much longer runtime if you have 50-100 dependent variables. – logic8 Mar 10 '17 at 19:20