9

How does one perform a multivariate (multiple dependent variables) logistic regression in R?

I know you do this for linear regression, and this works

form <-cbind(A,B,C,D)~shopping_pt+price
mlm.model.1 <- lm(form, data = train)

But when I try the following (see below) for logistic regression, it does not work.

model.logistic <- glm(form, family=binomial(link=logit), data=train)

Thank you for your help.

To add, it appears that my code to do this with linear models above may not be correct. I am trying to what is outlined in this document, which some might find useful.

ftp://ftp.cis.upenn.edu/pub/datamining/public_html/ReadingGroup/papers/multiResponse.pdf

blast00
  • 559
  • 2
  • 8
  • 18
  • 1
    I might be wrong..but I think the code you have for `lm` just fits four separate models, one for each dependent variable. I mean, you can just fit four logistic models. – Julián Urbano May 09 '14 at 00:49
  • I do not believe this is correct. It accounts for the correlation matrixes between the 4 dependent variables, and simultaneously predicts 4 models accounting for those interdependenies. – blast00 May 09 '14 at 01:06
  • Are you sure? Any reference for that? – Julián Urbano May 09 '14 at 01:09
  • Look here: http://socserv.socsci.mcmaster.ca/jfox/Books/Companion/appendix/Appendix-Multivariate-Linear-Models.pdf and http://www.ats.ucla.edu/stat/stata/dae/mvreg.htm (although Stata) – blast00 May 09 '14 at 01:14
  • 3
    thanks for the links. The second one though mentions this: "The individual coefficients, as well as their standard errors will be the same as those produced by the multivariate regression. However, the OLS regressions will not produce multivariate results, nor will they allow for testing of coefficients across equations" – Julián Urbano May 09 '14 at 01:29
  • Can you expand a bit? That seems to support the point on how multivariate prediction is significantly different than predictions (and associated standard errors / residuals) from separately estimated models. The key is, that I want to account for the correlations / relationships between my dependent variables. E.g., I f I know that everytime you buy red socks and a green shirt you also buy blue shoes, then if I estimate that it is likely for you to buy red socks and a green shirt, then this should also play into my prediction for you to buy blue shoes. Hopefully this makes sense. – blast00 May 09 '14 at 01:35
  • You should consider adding a `dput()` call with a sample of your dataset or provide a reproducible example. – marbel May 09 '14 at 02:08
  • 2
    You can do this with a generalized linear mixed model (GLMM) package if you 'stack' your data appropriately: `MCMCglmm` (see chapter 5 of the Course Notes) or `lme4` (see http://rpubs.com/bbolker/3336) should work, although the multivariate examples given there are multivariate normal. If you give a reproducible example I will consider posting a worked answer ... – Ben Bolker May 09 '14 at 02:21
  • Do you know if it is done in MCMCglmm if multivariate predictions be made? I am not interested in testing significance, my primary interest is prediction. Before I continue trying to reshape my data, since you wrote the course notes I was hoping you might have the answer. – blast00 May 14 '14 at 01:04

1 Answers1

3

It seems to me that lm(cbind(A,B,C,D)~shopping_pt+price) just fits four different models for the four dependent variables. The second link you provide even mentions:

The individual coefficients, as well as their standard errors will be the same as those produced by the multivariate regression. However, the OLS regressions will not produce multivariate results, nor will they allow for testing of coefficients across equations.

Meaning that all estimates will be the same, you'll just have to predict four times; and hypotheses on the fitted coefficients are independent across models.

I just tried this example below, showing that it indeed seems like that:

> set.seed(0)
> x1 <- runif(10)
> x2 <- runif(10)
> y1 <- 2*x1 + 3*x2 + rnorm(10)
> y2 <- 4*x1 + 5*x2 + rnorm(10)
> mm <- lm(cbind(y1,y2)~x1+x2)
> m1 <- lm(y1~x1+x2)
> m2 <- lm(y2~x1+x2)
# If we look at mm, m1 and m2, we see that models are identical
# If we predict new data, they give the same estimates
> x1_ <- runif(10)
> x2_ <- runif(10)
> predict(mm, newdata=list(x1=x1_, x2=x2_))
          y1       y2
1  2.9714571 5.965774
2  2.7153855 5.327974
3  2.5101344 5.434516
4  1.3702441 3.853450
5  0.9447582 3.376867
6  2.3809256 5.051257
7  2.5782102 5.544434
8  3.1514895 6.156506
9  2.4421892 5.061288
10 1.6712042 4.470486
> predict(m1, newdata=list(x1=x1_, x2=x2_))
        1         2         3         4         5         6         7         8         9        10 
2.9714571 2.7153855 2.5101344 1.3702441 0.9447582 2.3809256 2.5782102 3.1514895 2.4421892 1.6712042 
> predict(m2, newdata=list(x1=x1_, x2=x2_))
       1        2        3        4        5        6        7        8        9       10 
5.965774 5.327974 5.434516 3.853450 3.376867 5.051257 5.544434 6.156506 5.061288 4.470486

So this suggests that you can just fit four logistic models separately.

Julián Urbano
  • 8,378
  • 1
  • 30
  • 52
  • 2
    TThank you for this very insighftul, but simultaneously unfortunate. However, this means that the question is still unanswered, because I want to perform appropriate multivariate analysis where one accounts for the correlations across dependent variables. – blast00 May 09 '14 at 02:07