0

I'm running a regression where NJ is the treated group and PA is the control group. However, when I run the regression PA is the treated variable. PA is actually the untreated group, and should be the baseline. How do I change this?

cardkruger = read.csv('https://raw.githubusercontent.com/bandcar/Examples/main/cardkruger.csv')
reg = lm(fte ~ t*treated, cardkruger)
summary(reg)

output:

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  17.0652     0.4986  34.224   <2e-16 ***
t             0.5075     0.7085   0.716   0.4740    
treatedPA     2.8835     1.1348   2.541   0.0112 *  
t:treatedPA  -2.9140     1.6105  -1.809   0.0708 .  
bandcar
  • 649
  • 4
  • 11

1 Answers1

3

There are a variety of ways to do this but

cardkruger$treated <- relevel(factor(cardkruger$treated), "PA")

is the easiest way to change the baseline (do this before running your regression). From ?relevel:

The levels of a factor are re-ordered so that the level specified by ‘ref’ is first and the others are moved down.

The factor() statement is there to convert the variable from a character vector to an unordered factor; there's no reason here for it to be ordered (the terminology of "ordered" vs "unordered" is very confusing: see e.g. labelling of ordered factor variable )

If you like tidyverse you can do

library(readr)
library(forcats)
library(dplyr)
cardkruger <- (read_csv('https://raw.githubusercontent.com/bandcar/Examples/main/cardkruger.csv')
   |> mutate(across(treated, fct_relevel, "PA"))
)
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • it says `'relevel' only for (unordered) factors` – bandcar Nov 14 '22 at 01:47
  • 1
    Must do `cardkruger$treated <- relevel(factor(cardkruger$treated), "PA")` because the variable is "character" – Ric Nov 14 '22 at 01:50
  • thank you! I'll accept the answer once the waiting period is over. It says two more minutes before I can accept an answer. – bandcar Nov 14 '22 at 01:52