labelling of ordered factor variable

Question

I am trying to produce a univariate output table using the gtsummary package.

structure(list(id = 1:10, age = structure(c(3L, 3L, 2L, 3L, 2L, 
2L, 2L, 1L, 1L, 1L), .Label = c("c", "b", "a"), class = c("ordered", 
"factor")), sex = structure(c(2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 
1L, 2L), .Label = c("F", "M"), class = "factor"), country = structure(c(1L, 
1L, 1L, 1L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c("eng", "scot", 
"wale"), class = "factor"), edu = structure(c(1L, 1L, 1L, 2L, 
2L, 2L, 3L, 3L, 3L, 3L), .Label = c("x", "y", "z"), class = "factor"), 
lungfunction = c(45L, 23L, 25L, 45L, 70L, 69L, 90L, 50L, 
62L, 45L), ivdays = c(15L, 26L, 36L, 34L, 2L, 4L, 5L, 8L, 
9L, 15L), no2 = c(40L, 70L, 50L, 60L, 30L, 25L, 80L, 89L, 
10L, 40L), pm25 = c(15L, 20L, 36L, 48L, 25L, 36L, 28L, 15L, 
25L, 15L)), row.names = c(NA, 10L), class = "data.frame")

...
library(gtsummary)
publication_dummytable1_sum %>% 
select(sex,age,lungfunction,ivdays) %>% 
tbl_uvregression(
method =lm,
y = lungfunction,
pvalue_fun = ~style_pvalue(.x, digits = 3)
) %>% 
add_global_p() %>%  # add global p-value 
bold_p() %>%        # bold p-values under a given threshold
bold_labels()
...

When I run this code I get the output below. The issue is the labeling of the ordered factor variable (age). R chooses its own labeling for the ordered factor variable. Is it possible to tell R not to choose its own labeling for ordered factor variables?

I want output like the following:

if either of the answers below solved your problem you are encouraged to click on the check-mark to accept one of them. — Ben Bolker, Jul 25 '21 at 11:33

Ben Bolker · Accepted Answer · 2021-07-22T20:26:05.117

Like many other people, I think you might be misunderstanding the meaning of an "ordered" factor in R. All factors in R are ordered, in a sense; the estimates etc. are typically printed, plotted, etc. in the order of the levels vector. Specifying that a factor is of type ordered has two major effects:

it allows you to evaluate inequalities on the levels of the factor (e.g. you can filter(age > "b"))
the contrasts are set by default to orthogonal polynomial contrasts, which is where the L (linear) and Q (quadratic) labels come from: see e.g. this CrossValidated answer for more details.

If you want this variable treated in the same way a regular factor (so that the estimates are made for differences of groups from the baseline level, i.e. treatment contrasts), you can:

convert back to an unordered factor (e.g. factor(age, ordered=FALSE))
specify that you want to use treatment contrasts in your model (in base R you would specify contrasts = list(age = "contr.treatment"))
set options(contrasts = c(unordered = "contr.treatment", ordered = "contr.treatment")) (the default for ordered is "contr.poly")

If you have an unordered ("regular") factor and the levels are not in the order you want, you can reset the level order by specifying the levels explicitly, e.g.

mutate(across(age, factor, 
   levels = c("0-10 years", "11-20 years", "21-30 years", "30-40 years")))

R sets the factors in alphabetical order by default, which is sometimes not what you want (but I can't think of a case where the order would be 'random' ...)

@ Ben, many thanks for your prompt reply and answer, actually i have age as factor variable which has 4 levels (0 -10 years, 11 -20 years,21-30 and 30-40 years). when i run any summary table or regression then r produce the output for these categories (age group) in random order. so i ordered the levels of age variable. but now in gt summary output i get r own labelling for these categories. — skpak, Jul 22 '21 at 20:00
See the update to my answer. I very much doubt "random order" ... ? — Ben Bolker, Jul 22 '21 at 20:26

score 2 · Answer 2 · answered Jul 22 '21 at 19:35

The easiest way to remove the odd labelling for the ordered variables, is to remove the ordered class from these factor variables. Example below!

library(gtsummary)
library(tidyverse)
packageVersion("gtsummary")
#> [1] '1.4.2'

publication_dummytable1_sum <- 
  structure(list(id = 1:10, age = structure(c(3L, 3L, 2L, 3L, 2L, 
                                              2L, 2L, 1L, 1L, 1L), .Label = c("c", "b", "a"), class = c("ordered", 
                                                                                                        "factor")), sex = structure(c(2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 
                                                                                                                                      1L, 2L), .Label = c("F", "M"), class = "factor"), country = structure(c(1L, 
                                                                                                                                                                                                              1L, 1L, 1L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c("eng", "scot", 
                                                                                                                                                                                                                                                              "wale"), class = "factor"), edu = structure(c(1L, 1L, 1L, 2L, 
                                                                                                                                                                                                                                                                                                            2L, 2L, 3L, 3L, 3L, 3L), .Label = c("x", "y", "z"), class = "factor"), 
                 lungfunction = c(45L, 23L, 25L, 45L, 70L, 69L, 90L, 50L, 
                                  62L, 45L), ivdays = c(15L, 26L, 36L, 34L, 2L, 4L, 5L, 8L, 
                                                        9L, 15L), no2 = c(40L, 70L, 50L, 60L, 30L, 25L, 80L, 89L, 
                                                                          10L, 40L), pm25 = c(15L, 20L, 36L, 48L, 25L, 36L, 28L, 15L, 
                                                                                              25L, 15L)), row.names = c(NA, 10L), class = "data.frame") |>
  as_tibble()

# R labels the order factors like this in lm()
lm(lungfunction ~ age, publication_dummytable1_sum)
#> 
#> Call:
#> lm(formula = lungfunction ~ age, data = publication_dummytable1_sum)
#> 
#> Coefficients:
#> (Intercept)        age.L        age.Q  
#>       51.17       -10.37       -15.11


tbl <-
  publication_dummytable1_sum %>% 
  # remove ordered class
  mutate(across(where(is.ordered), ~factor(., ordered = FALSE))) %>%
  select(sex,age,lungfunction,ivdays) %>% 
  tbl_uvregression(
    method =lm,
    y = lungfunction,
    pvalue_fun = ~style_pvalue(.x, digits = 3)
  )

^{Created on 2021-07-22 by the reprex package (v2.0.0)}

@ Daniel, many thanks for your prompt reply and answer, actually i have age as factor variable which has 4 levels (0 -10 years, 11 -20 years,21-30 and 30-40 years). when i run any summary table or regression then r produce the output for these categories (age group) in random order. so i ordered the levels of age variable. but now in gt summary output i get r own labelling for these categories. — skpak, Jul 22 '21 at 19:58
See Ben's very helpful answer....you don't need ordered factors. The categories are not in a random order, and you can set the order without making the class an ordered factor. — Daniel D. Sjoberg, Jul 22 '21 at 20:12
@ ben and @Daniel. many thanks thats really helpful. have a great evening ahead. — skpak, Jul 22 '21 at 20:47

labelling of ordered factor variable

2 Answers2

Linked