0

I have a linear model of class "lm" that I am viewing with summary(lm), a toy version of which is:

fit <- lm(Strength ~ Age + Sex, data = mydata)

summary(fit)

Understandably, Age is a continuous variable while Sex is a categorical variable. The relevant part of the summary(fit) output looks like:

             Estimate 
(Intercept)  -1.838e-01
Age          -5.264e-03
Sex.L        3.260e-01

How should I interpret this, specifically the categorical variable? I understand this to mean:

Strength = -0.1838 + (0.005264 * Age) + (0.326 * Sex)

but is this correct, and what value would Sex take? 1 for one sex, and 0 for the other? And how should I check which sex takes the value 1? Since my factor levels for Sex are Male and Female, I assume .L is a dummy variable for one of them, but I don't know how to check this.

Any advice would be much appreciated.

Thank you very much.

Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
PhelsumaFL
  • 61
  • 8

1 Answers1

4

The name for the coefficient is "Sex.L"! This implies that Sex is an ordered categorical variable, and polynomial contrast encoding instead of treatment encoding was used. In this case, Sex in your equation does not simply take 0 or 1.


You really need to convert this ordered factor to the usual factor first:

mydata$Sex <- factor(mydata$Sex, ordered = FALSE)

You can check levels(mydata$Sex) at this stage. The 1st level will be dropped, and the coefficient of Sex is for the 2nd level. Note that using a different contrast will result in different coefficients.


You can also control levels to be in your desired order, say:

mydata$Sex <- factor(mydata$Sex, levels = c("Male", "Female"), ordered = FALSE)

Note that changing the order of levels gives different regression coefficients, too.


Anyway, as long as Sex is the usual factor (i.e., is.ordered(mydata$Sex) is FALSE), treatment contrast encoding will be applied. The 1st level is coded as 0, while the 2nd level is coded as 1. Suppose the fitted model coefficients are a, b and c, then the equation will be:

Strength = a + b * Age + c * Sex

where Sex is 0 for the 1st level, and 1 for the 2nd level.


A bit of background:

The "L" in "Sex.L" means "Linear", which is an indication of polynomial contrast. If the factor has 4 levels instead, we will see "L" (Linear), "Q" (Quadratic) and "C" (Cubic).

However, if Sex is the usual factor, the reported name should be "SexMale" or "SexFemale". Yes, this is informative enough.

  • If we see "SexMale", then "Female" is the 1st level, so in the equation, Sex is 0 for Female and 1 for Male.

  • If we see "SexFemale", then "Male" is the 1st level, so in the equation, Sex is 0 for Male and 1 for Female.

This naming convention for categorical variables after contrast encoding is very helpful.


A reproducible example

Since OP did not provide a reproducible example, I decided to simulate a dataset (where Sex is an ordered factor) to help readers follow what I said above.

mydata <- structure(list(Strength = c(-0.4484, -0.4584, -0.4765, -0.4676, 
-0.4979, -0.507, -0.5094, -0.5071, -0.5046, -0.5346, -0.5302, 
-0.5298, -0.5354, -0.5489, -0.5646, -0.5858, -0.5731, -0.5368, 
-0.5418, -0.5521, -0.5967, -0.5826, -0.5751, -0.5914, -0.6069, 
-0.5831, -0.6045, -0.6111, -0.618, -0.6375, -0.634, -0.6212, 
-0.6496, -0.6387, -0.6387, -0.6695, -0.6413, -0.6499, -0.6763, 
-0.6826, -0.6579, -0.7051, -0.6982, -0.7004, -0.7101, -0.6964, 
-0.6958, -0.7583, -0.7247, -0.7117, -0.7328, -0.0037, -0.003, 
-0.0095, 0.0137, -0.0228, -0.025, -0.0339, -0.041, -0.0271, -0.0303, 
-0.0633, -0.0572, -0.0542, -0.0648, -0.087, -0.0983, -0.0625, 
-0.0832, -0.0776, -0.1046, -0.1158, -0.1331, -0.1137, -0.1288, 
-0.1366, -0.1538, -0.1346, -0.1348, -0.1698, -0.1726, -0.1798, 
-0.1888, -0.1735, -0.1724, -0.183, -0.2001, -0.2029, -0.1812, 
-0.2126, -0.2086, -0.2278, -0.2279, -0.2294, -0.208, -0.2575, 
-0.258, -0.2356, -0.2417, -0.2406, -0.2683, -0.2914), Age = c(10L, 
11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 
24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 
37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L, 47L, 48L, 49L, 
50L, 51L, 52L, 53L, 54L, 55L, 56L, 57L, 58L, 59L, 60L, 10L, 11L, 
12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 
25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 
38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L, 47L, 48L, 49L, 50L, 
51L, 52L, 53L, 54L, 55L, 56L, 57L, 58L, 59L, 60L), Sex = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L), levels = c("Female", "Male"), class = c("ordered", 
"factor"))), row.names = c(NA, -102L), class = "data.frame")

A model that OP got:

fit1 <- lm(Strength ~ Age + Sex, data = mydata)
#(Intercept)          Age        Sex.L  
#  -0.178985    -0.005426     0.329874  

is.ordered(mydata$Sex)
#[1] TRUE

Convert Sex to the usual factor:

mydata$Sex <- factor(mydata$Sex, ordered = FALSE)

is.ordered(mydata$Sex)
#[1] FALSE

levels(mydata$Sex)
#[1] "Female" "Male"  

fit2 <- lm(Strength ~ Age + Sex, data = mydata)
#(Intercept)          Age      SexMale  
#  -0.412241    -0.005426     0.466512  

Control order of levels:

mydata$Sex <- factor(mydata$Sex, levels = c("Male", "Female"), ordered = FALSE)

is.ordered(mydata$Sex)
#[1] FALSE

levels(mydata$Sex)
#[1] "Male"   "Female"

fit3 <- lm(Strength ~ Age + Sex, data = mydata)
#(Intercept)          Age    SexFemale  
#   0.054270    -0.005426    -0.466512  

Extensive reading

(I was made aware of these Q & A just now.)

Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
  • 2
    Dear Zheyuan Li, thank you for this helpful response. Yes, I was thrown by seeing Sex.L rather than seeing something like "SexMale" and was not sure how to interpret it. This makes sense and I will change the factors as you suggest. Hopefully this detailed answer will also prove useful to others in the future. Thank you again. – PhelsumaFL Jul 28 '22 at 10:49
  • After trying mydata$Sex <- factor(mydata$Sex), levels(mydata$Sex) prints "[1] "Female" "Male" " and is.ordered(mydata$Sex) gives "[1] TRUE". I tried mydata$Sex <- factor(mydata$Sex, ordered = F), which changes the output of is.ordered(mydata$Sex) to "[1] False". However, the summary(fit) still shows "sex.L" rather than, e.g., "SexMale". Any thoughts? – PhelsumaFL Jul 28 '22 at 11:28
  • N.B. there are no indications of the other levels described by Zheyuan Li, i.e., there are not "Sex.Q" or Sex.C" coefficient rows. I am not sure whether this is relevant or not. – PhelsumaFL Jul 28 '22 at 11:29
  • 1
    @PhelsumaFL I wrote: **If the factor has 4 levels instead**, we will see "L" (Linear), "Q" (Quadratic) and "C" (Cubic). – Zheyuan Li Jul 28 '22 at 11:52
  • 1
    @PhelsumaFL You need to refit your model after updating `Sex` in your data. Again, had you provided a reproducible example, you would not have been delayed in getting a working solution. Learn this lesson and accept the answer. – Zheyuan Li Jul 30 '22 at 19:39
  • I don't take orders from you. Learn some manners. – PhelsumaFL Aug 01 '22 at 09:17