0

I am a complete beginner in R/R Studio, coding and statistics in general.

In R, I am running a GLM where my Y variable is a no/yes (0/1) category and my X variable is a Sex category (female/male).

So I have run the following script:

hello <- read.csv(file.choose())
hello$sexbin <- ifelse(hello$Sex == 'm',0,ifelse(hello$Sex == 'f',1,NA))
modifhello <- subset(hello,hello$Combi_cag_long>=36)
model1 <- glm(modifhello$VAB~modifhello$Sex, family=binomial(link=logit),
              na.action=na.exclude, data=modifhello)
summary.lm(model1)

However, in my output, R seems to have split male/female as two separate variables, suggesting that it is not treating them as proper binary variables:

    Coefficients

                         Estimate      Std. Error      t value    Pr(>|t|)

          (Intercept)    -3.689         1.009          -3.656     0.000258 ***
          modifhello$Sexf 2.506         1.010          2.482       0.013084 *  
          modifhello$Sexm 2.922         1.010          2.894     0.003820 ** 
          ---
          Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

What do I need to add to my script to correct this?

FOUND THE SOLUTION

Need to simply put modifhello$VAB~modifhello$sexbin not modifhello$VAB~modifhello$sex (as this is the old column).

123
  • 1
  • 1
  • 3
    It should be `glm(VAB ~ Sex, family=binomial(link=logit), na.action=na.exclude, data=modifhello)`. Never use `$` within a model formula. Since `Sex` seems to be a character variable, R should automatically treat it as categorically and apply treatment contrasts. If you still see two betas, you have a third level in that variable (which nowadays would not be unusual). What does `table(modifhello$Sex)` tell you? – Roland Nov 11 '20 at 14:41
  • And in case, it's not clear: You don't need to do manual dummy encoding. And you don't actually use `sexbin` in the model. – Roland Nov 11 '20 at 14:42
  • Thank you for your reply. I have found the solution: Although I added my new binary column (sexbin), I was still running the model using the old column (Sex). I fixed this by rewriting the model as: glm(modifhello$VAB~modifhello$sexbin) – 123 Nov 12 '20 at 15:50
  • You have not understood anything I've tried to teach you, have you? – Roland Nov 12 '20 at 20:32

0 Answers0