-1

I am looking to estimate parameters for a large set (274) of correlated response variables which follow a NB dist. The goal is to use the parameters for a generalized linear model for each of the variables with a two level categorical predictor. I am familiar with the MASS package in R, but there doesn't seem to be a way to do a multivariate analyses. Is there a way this can be achieved (for instance a for loop) in R.

Thanks in advance for any help/suggestions

AA1989
  • 1
  • 1
  • 2
  • 2
    Can you show some of your code? For example, what library? `lm()` function or other function? Of course, there are always ways, they might not optimal, but it is possible. – Bill Chen Nov 15 '19 at 00:51
  • unfortunately don't have access to it at the moment. can post tomorrow. I am trying to loop the glm.nb function over columns of a matrix/dataframe. – AA1989 Nov 15 '19 at 00:59
  • I’m finding this difficult to follow from a statistical perspective. Regression analyses should be selected by the likely distribution.. not of the data ... but rather of the errors around a linear predictor (possibly with a link transformation. – IRTFM Nov 15 '19 at 03:09
  • It really doesn't make sense. You have 274 response variables and you want to regress against just one predictor? What are you trying to get from this model? – StupidWolf Nov 15 '19 at 14:27
  • The response variables are cell counts from brain regions and the predictor is whether or not the organism displayed a particular behavior. So the goal is to see which regions display different cell activity between the two groups. I personally wouldn't analyze the data this way but its what I've been asked to do. – AA1989 Nov 16 '19 at 15:51

1 Answers1

0

You do first construct your model into a string,

options = c('Sex', 'Sex/(Age + Eth*Lrn)')
evaluation_str = c()
for(i in c(1:2)){
        evaluation_str <- c(evaluation_str, paste0('glm.nb(Days ~ ', options[i], ', data = quine)'))
}

> evaluation_str
[1] "glm.nb(Days ~ Sex, data = quine)"                 "glm.nb(Days ~ Sex/(Age + Eth*Lrn), data = quine)"

Then use eval() and parse() function to run a loop, I prefer lapply(), so I don't need to initiate a vector first as results container.

results <- lapply(c(1:2), function(x){
        eval(parse(text=evaluation_str[i]))
})

> results
[[1]]

Call:  glm.nb(formula = Days ~ Sex/(Age + Eth * Lrn), data = quine, 
    init.theta = 1.597990733, link = log)

Coefficients:
    (Intercept)             SexM       SexF:AgeF1       SexM:AgeF1       SexF:AgeF2       SexM:AgeF2       SexF:AgeF3  
        3.01919         -0.47541         -0.70887         -0.72373         -0.61486          0.62842         -0.34235  
     SexM:AgeF3        SexF:EthN        SexM:EthN       SexF:LrnSL       SexM:LrnSL  SexF:EthN:LrnSL  SexM:EthN:LrnSL  
        1.15084         -0.07312         -0.67899          0.94358          0.23891         -1.35849          0.76142  

Degrees of Freedom: 145 Total (i.e. Null);  132 Residual
Null Deviance:      234.6 
Residual Deviance: 167.6    AIC: 1093

[[2]]

Call:  glm.nb(formula = Days ~ Sex/(Age + Eth * Lrn), data = quine, 
    init.theta = 1.597990733, link = log)

Coefficients:
    (Intercept)             SexM       SexF:AgeF1       SexM:AgeF1       SexF:AgeF2       SexM:AgeF2       SexF:AgeF3  
        3.01919         -0.47541         -0.70887         -0.72373         -0.61486          0.62842         -0.34235  
     SexM:AgeF3        SexF:EthN        SexM:EthN       SexF:LrnSL       SexM:LrnSL  SexF:EthN:LrnSL  SexM:EthN:LrnSL  
        1.15084         -0.07312         -0.67899          0.94358          0.23891         -1.35849          0.76142  

Degrees of Freedom: 145 Total (i.e. Null);  132 Residual
Null Deviance:      234.6 
Residual Deviance: 167.6    AIC: 1093

If you want to check each individual model. You can do so:

> coef <- results[[1]]$coefficients
> coef
    (Intercept)            SexM      SexF:AgeF1      SexM:AgeF1      SexF:AgeF2      SexM:AgeF2      SexF:AgeF3 
      3.0191882      -0.4754051      -0.7088737      -0.7237349      -0.6148641       0.6284201      -0.3423469 
     SexM:AgeF3       SexF:EthN       SexM:EthN      SexF:LrnSL      SexM:LrnSL SexF:EthN:LrnSL SexM:EthN:LrnSL 
      1.1508429      -0.0731245      -0.6789869       0.9435760       0.2389097      -1.3584906       0.7614156 
Bill Chen
  • 1,699
  • 14
  • 24
  • I apologize if I'm not expressing my issue correctly, but I have one predictor variable that is categorical and 274 different variables (brain regions if relevant) with counts for each sample. So the formula would be: – AA1989 Nov 15 '19 at 01:29
  • Region 1 + Region2 + ....Region 274 ~ Condition. Here condition is the categorical predictor. So I asm trying to loop through each column where each formula is Region_N ~ Condition. I believe the above code is for having multiple categorical predictors. – AA1989 Nov 15 '19 at 01:29
  • @AA1989, if the conditions don't change, that should be even easier, you just need to change to `paste0('Region', i, ' ~ ...')` – Bill Chen Nov 15 '19 at 01:41