Change order of categorical variable and reference category using lm

Question

I have an unordered categorical variable (event_time) with 5 different options ("future", "past", "prebirth", "never", "uncertain") as a predictor variable, and I want to specify somehow to make "never" the reference category (ideally without transforming the variable). I'm just using lm and then texreg::screenreg(list(m1, m2, m3) to compare output for models with different outcome variables but this same predictor.

If there's a way to to rearrange the order that the categories show up in the model (perhaps within screenreg?) that'd be wonderful.

And an added bonus if this can all be done without dealing with transforming and factor variables (I know how to do this with relevel if the variable was a factor already)...thanks much.

Some data:

structure(list(yvar = c(4.43024525984776, -3.01051231657988, 
4.70993862460106, -2.03636967067474, -1.09802960848352, -1.16527740798651, 
5.6002805983151, -7.03524067599639, 1.02474010023752, 0.647438645180132
), event_time = c(NA, "Pre", "Future", "Time unknown", "Future", "Future", NA, 
"Never", NA, "Never"), race = c("Black", "Black", "White", "Black", 
"Black", "Black", "Black", "White", "Black", "White"), log_parent_income = c(4.0073333, 
NA, 3.8066626, 2.1972246, 0.69314718, 4.2484951, 3.9120231, 1.9459101, 
2.3025851, 3.8066626)), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))

And then just doing a simple lm(yvar ~ event_time + log_parent_income + race ... model.

By far the easiest way to do this is by transforming the factor variable - I'm sure it's *possible* to do it otherwise, but a pain. (For example, you could set up custom contrasts.) Can you clarify why that's *not* an acceptable/preferred solution? You can do it on the fly if you like, i.e. `lm(..., data= transform(orig_df, event_time=relevel(event_time, "never")))` ... — Ben Bolker, Apr 16 '21 at 23:20
For the ordering in `screenreg`, how about the `reorder.coef` argument? (Can we have a [mcve] ... ?) — Ben Bolker, Apr 16 '21 at 23:22
I mostly just dislike dealing with factors (I am a Stata person typically and have never gotten the hang of factors). Added reproducible example to the question. — g_t_b, Apr 16 '21 at 23:37
can you clarify how you want to reorder the categories in the output (e.g. what order, other than alphabetical-after-reference-level, do you prefer)? Is that separate from making "Never" the baseline level? — Ben Bolker, Apr 17 '21 at 00:00
@BenBolker making "Never" the baseline is the real priority. But the ideal order they'd show up in the output would be "Future", "Pre", "Time Unknown" — g_t_b, Apr 17 '21 at 00:16

Onyambu · Answer 1 · 2021-04-17T01:29:39.840

In base R, you can change the contrasts directly on the linear regression

 lm(yvar ~ C(event_time, base = 2)+ log_parent_income + race, data = df)

That is if you knew the base you want.

If you know that the reference level is the last one, then you can do:

 lm(yvar ~ event_time + log_parent_income + race, data = df, 
     contrasts = list(event_time = "contr.SAS"))

Of course this means that if you were to do the same for various variables, just change the options:

 options(contrasts = rep("contr.SAS",2))
 lm(yvar ~ event_time + log_parent_income + race, data = df)

This is assuming that Never is the last reference level. You can mess up with the contr.treatment base argument to set the reference to any number that you want

Lastly, you can write up a function that takes in the base argument as a string character:

C1 <- function (object, contr, how.many, ...) 
{
  base <- list(...)$base
  if(!is.null(base) &is.character(base))
      base <- match(base, levels(factor(object)))
  C(object, base = base)
}

Then you could use it as:

lm(yvar~C1(event_time, base = "Never"), df)

Is that not enough? You could change the contrasts argument by providing a function too. With this, the names will be maintained I believe

This is nice. I am continually amazed at how many **different** ways there are to set contrasts in R (not necessarily a good thing!): `C`, `contrasts<-`, `options(contrasts=...)`, `lm(..., contrasts=...)` (is that all?) — Ben Bolker, Apr 17 '21 at 01:39

Ben Bolker · Answer 2 · 2021-04-17T01:37:56.983

I don't know if this will make you happy or not, but here goes.

A helper function that will be useful for reordering:

match_pattern <- function(regex, target) {
    sapply(regex, function(x) {
        g <- grep(x,target)
        if (length(g)==0) return(NA)
        if (length(g)>1) stop("multiple matches")
        return(g)
    })
}

Fit the model. Here I'm using the forcats package because fct_relevel is less fussy about accepting a character vector (i.e. I don't need relevel(factor(event_time), "Never").

m1 <- lm(yvar~event_time,
         data=transform(dd, event_time=forcats::fct_relevel(event_time,"Never")))

If you like the tidyverse you can make it slightly more compact:

dd %>% mutate(across(event_time, ~fct_relevel(.,"Never"))) %>%
     lm(formula=yvar~event_time)

Now texreg::screenreg(m1) will actually output the coefficients in your preferred order ("Future", "Pre", "Time unknown") because it happens to be alphabetical. If you wanted to change the order to something else you could:

ref_order <-  c("(Intercept)", "Time unknown", "Future", "Pre")
pp <- match_pattern(ref_order,names(coef(m1)))
texreg::screenreg(m1, reorder.coef=pp)

~~While it would theoretically be possible to do what you want without touching the data set (by setting up a custom contrast), I think it would be considerably harder.~~ In the long run trying to work in a language without adopting its idioms can be tough — you might try figuring out what you don't like about factors and trying to address it (the forcats package can be helpful for some tasks).

I do indeed like the Tidyverse...and forcats seems to make things a lot easier. The texreg::screenreg code is also exactly what I was looking for. Thanks a lot for the help and the explanations, Ben! — g_t_b, Apr 17 '21 at 00:44

Change order of categorical variable and reference category using lm

2 Answers2