0

I am trying to do a univariate logistic regression analysis. The input is a data frame with 1 response variable, some demographics (age, gender and ethnicity) and >100 predictor variables. In order to analyse it I have been using:

#Function
proc_glm <- function(predictors) {
    univariate <- glm(Data$Outcome ~ predictors, family = binomial)
    
    return(cbind(coef(summary(univariate)),OR = exp(coef(univariate)), exp(confint(univariate))))
  }

#Call Function
glm_output <- lapply(Data[5:150], proc_glm)

This works completely fine on the overall database. I then subsetted the data based on ethnicity, which I did using:

Data1 <- subset(Data,Ethnicity==0)

No obvious issue; "Data 1" has fewer rows than "Data" but the same number of variables. There is no missing data.

I then tried to run the same analysis as before, replacing Data1 for Data in both places but I get the following error:

Error in cbind(coef(summary(univariate)), OR = exp(coef(univariate)), : number of rows of matrices must match (see arg 3)

I'm not sure what I've changed which causes the error. I'm working on R Studio - Version 1.2.1335

Data looks like this:

Data <-cbind(
  data.frame(
    Age=sample(20:80,50),
    Gender=sample(0:1,size=50,replace=TRUE),
    Ethnicity=sample(0:2,size=50,replace=TRUE),
    Outcome=sample(0:1,size=50,replace=TRUE)
  ),
  data.frame(replicate(100,sample(0:2,50,rep=TRUE)))
)
beanie42
  • 1
  • 2
  • Did you rerun the whole thinig or just this line `glm_output <- lapply(Data1[5:150], proc_glm)`? – Tob Jul 30 '21 at 10:38
  • I reran all of it. – beanie42 Jul 30 '21 at 10:38
  • What happens if you make another function called `proc_glm1` and then try that, do you still get the error? – Tob Jul 30 '21 at 10:40
  • Can you show us some of your data or create a test dataframe that we can try your code with? Would be useful in seeing what is going on. – Tob Jul 30 '21 at 10:45
  • Same error - ignore my previous response. I had missed one change so it was trying to call the original formula. – beanie42 Jul 30 '21 at 10:47
  • I've added data to the original question, I couldn't work out how to attach a file. There are 3 levels for the predictors. – beanie42 Jul 30 '21 at 10:59
  • `proc_glm` uses `Data` inside. Don't you need to replace that by `Data1`? In any case, the code does not run as is since the test data has only a few columns. Please make the example reproducible so that many people can help you. – Kota Mori Jul 30 '21 at 11:05
  • I've created a new proc_glm1 as recommended, and that has Data1 inside it. I had previously replaced it but got the same error. – beanie42 Jul 30 '21 at 11:15
  • I've created a bigger sample data set; although I don't get the same error with the randomly generated data which may suggest the issue is something in the raw data set. – beanie42 Jul 30 '21 at 12:00

1 Answers1

0

The problem is that you use the argument predictors, and the global variable Data in your function (so it always uses every row of Outcome). You need to pass in the outcome column as an argument so it matches the same number of rows as the predictors.

Data <-cbind(
  data.frame(
    Age=sample(20:80,50),
    Gender=sample(0:1,size=50,replace=TRUE),
    Ethnicity=sample(0:2,size=50,replace=TRUE),
    Outcome=sample(0:1,size=50,replace=TRUE)
  ),
  data.frame(replicate(100,sample(0:2,50,rep=TRUE)))
)

proc_glm <- function(predictors, outcome) {
  univariate <- glm(outcome ~ predictors, family = binomial)
  
  return(cbind(coef(summary(univariate)),OR = exp(coef(univariate)), exp(confint(univariate))))
}

glm_output <- lapply(Data[5:100], proc_glm, outcome=Data$Outcome)

Data1 <- subset(Data,Ethnicity==0)
glm_output <- lapply(Data1[5:100], proc_glm, outcome=Data1$Outcome)
Bruce M
  • 56
  • 3
  • Thank you for the answer. I actually get the same error with this script. I think the issue is related to some variables having insufficient entries for the analysis. I have been using a trycatch step around the cbind which is catching the errors. – beanie42 Aug 12 '21 at 08:28