2

I am using lapply to perform several glm regressions on one dependent variable by one independent variable at a time. but I'm not sure how to extract the P values at a time.

There are 200 features in my dataset, but the code below only gave me the P value of feature#1. How can I get a matrix of all P values of the 200 features?

valName<- as.data.frame(colnames(repeatData))
featureName<-valName[3,]
lapply(featureName,
       function(var) {       
         formula    <- as.formula(paste("outcome ~", var))
         fit.logist <- glm(formula, data = repeatData, family = binomial)
         summary(fit.logist)
         Pvalue<-coef(summary(fit.logist))[,'Pr(>|z|)'] 
       })
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
Shykp
  • 35
  • 4
  • 1
    you can return the Pvalue at the end – akrun May 30 '21 at 01:11
  • Do you mean put Pvalue statement after })? I tried this and got an error: "Error in summary(fit.logist) : object 'fit.logist' not found" – Shykp May 30 '21 at 01:33
  • Not after. Before. As in `lapply(featureName,......... Pvalue<-coef(summary(fit.logist))[,'Pr(>|z|)'] Pvalue` – GuedesBF May 30 '21 at 01:41
  • Thank you. Just tried it. Still only got the first feature's P value. [[1]] (Intercept) Sex 0.0003512693 0.0002784681 – Shykp May 30 '21 at 01:45

1 Answers1

3

I I simplified your code a little bit; (1) used reformulate() (not really different, just prettier) (2) returned only the p-value for the focal variable (not the intercept p-value). (If you leave out the 2, you'll get a 2-row matrix with intercept and focal-variable p-values.)

My example uses the built-in mtcars data set, with an added (fake) binomial response.

repeatData <- data.frame(outcome=rbinom(nrow(mtcars), size=1, prob=0.5), mtcars)
ff <-   function(var) {       
         formula    <- reformulate(var, response="outcome")
         fit.logist <- glm(formula, data = repeatData, family = binomial)
         coef(summary(fit.logist))[2, 'Pr(>|z|)'] 
       }
## skip first column (response variable).
sapply(names(repeatData)[-1], ff)
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • Thank you so much for your help! I tried the code you kindly provided, it works pretty good with the mtcars data. But when I tried my data, it gave me another error: "Error in coef(summary(fit.logist))[2, "Pr(>|z|)"] : subscript out of bounds" Here’s the structure of my data. It looks pretty the same as mtcars data, I just cannot figure out what’s going wrong... Outcome Sex F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F1 0 1 2 2 1 1 2 2 1 1 2 2 1 1 1 1 1 0 0 0 1 1 0 0 2 2 – Shykp May 31 '21 at 00:41
  • If it's not working you have to try to figure out how to give us a [mcve]. A debugging tip: put a `cat(var,"\n")` in as the first line of `ff` so you can identify which element is giving you trouble - then step through `ff` one command at a time to see what's going wrong. – Ben Bolker May 31 '21 at 00:45
  • I appreciated your help. Your code worked pretty well with the build-in dataset and newly created test dataset. I'm still struggling with my dataset, but I'm more than happy to accept your answer. – Shykp Jun 03 '21 at 02:14