Categorical variable coding with 0s and 1s giving different outputs in R

Question

What I did: I did a linear mixed effect model analysis in R with nlme library. I have a categorical fixed variable, Blurriness, with 2 levels: B standing for Blurred, N standing for Non-Blurred. Upon suggestion, I changed them into, 1(for B) and 0(for N).

Problem: I re-run the model. And I got different p-values/results (I do not mean the + p values became -, I mean like the numbers changed).

What I did to solve it: Then, I reversed the order (I gave 0 for B, and 1 for N) to see if it changes anything. And I got the same p-values and coefficients as when I coded it as B and N (great!). But do you have any idea why that might be?

Edit: I add here a reproducible example: the data with only 80 rows: https://home.mycloud.com/action/share/dedef0a3-794c-4ccc-b245-f93559de1f33

katilimci = factor(dENEME$Participants)
resimler = factor(dENEME$ImageID)
bugu0 = factor(dENEME$Blurriness)
sira0 = factor(dENEME$TheOrderofTheImages)
cekicilik0 = factor(dENEME$TargetAttractiveness)
bugu1 = factor(dENEME$Blurriness2)
sira1= factor(dENEME$TheOrderofTheImages2)
cekicilik1 = factor(dENEME$TargetAttractiveness2)
library(nlme)
myModel1 = lme(Ratings~bugu0+sira0+cekicilik0+bugu0:cekicilik0+bugu0:sira0+sira0:cekicilik0+bugu0:cekicilik0:sira0,data = dENEME, random=list(katilimci=~1, resimler=~1),na.action = na.exclude)
myModel2 = lme(Ratings~bugu1+sira1+cekicilik1+bugu1:cekicilik1+bugu1:sira1+sira1:cekicilik1+bugu1:cekicilik1:sira1,data = dENEME, random=list(katilimci=~1, resimler=~1),na.action = na.exclude)
summary(myModel1)
summary(myModel2)

The resulting p-values are different and I could not find the reason why...

Edit 2: Another reproducible example:

library(nlme)
#fixed factors:
variable1<-as.factor(rep(c("A","B"),each=20))
variable2<-as.factor(sample(rep(c("A","B"),each=20)))
variable3<-as.factor(sample(rep(c("A","B"),each=20)))
#y variable:
ratings<-c(rnorm(20,0,2),rnorm(20,1,6))
#random factor:
ID<-as.factor(paste("ID",rep(1:20,times=2),sep=""))
#symmetrical matrixes:
contrasts(variable1)<-c(0,1)
#in the Line 11, for variable1, level A becomes 0, and level B becomes 1.
contrasts(variable2)<-c(0,1)
contrasts(variable3)<-c(0,1)

#model1:
m1<-lme(ratings~variable1*variable2*variable3,random=~1|ID)

contrasts(variable1)<-c(1,0)
#in the line 19, for variable1, level A becomes 1 and level B becomes 0. So, all the fixed variables mirrors each other in the data that we created.
contrasts(variable2)<-c(1,0)
contrasts(variable3)<-c(1,0)


#model2:
m2<-lme(ratings~variable1*variable2*variable3,random=~1|ID)


summary(m1)
summary(m2)
#we bind the parameters of the 2 models to see them together for comparison:
rbind(
summary(m1)[[20]][,1],
summary(m2)[[20]][,1]
)

Because B precedes N in the alphabet and R creates factor levels in lexical order by default. — Roland, Jan 01 '20 at 12:26
Yes, as you said, R must have given 0 to B and that is why I got the same results when I did it that way. But then would this change drastically affect the p-values? I would expect that it would mirror the results as in positives will become negatives, but the change in numbers, I could not explain?... — Yesim, Jan 01 '20 at 12:33
You should post a reproducible example. It doesn't need to be as complicated as the model where you discovered this, but it should be fully self-contained so readers here can run it and see what you saw. — user2554330, Jan 01 '20 at 13:20
Are the values numerical, or did you actually make the values categorical using `factor()`? — mikeck, Jan 01 '20 at 13:29
I made the values categorical by using factor(). I put an example code in the edit. Thank you for your response! — Yesim, Jan 02 '20 at 18:35

score 0 · Answer 1 · answered Jan 01 '20 at 14:14

I've had a go at making a reproducible example since there isn't one in the question.

require(nlme)

df <- data.frame(dv = c(rnorm(20, 0), rnorm(20, 1)),
                 Blurriness = factor(c(rep("B", 20), rep("N", 20))),
                 Random = factor(rep(rep(c("x", "y"), each = 5), 2)),
                 Blurriness_1_0 = rep(1:0, each = 20),
                 Blurriness_0_1 = rep(0:1, each = 20))

m <- list()
m[[1]] <- lme(dv ~ Blurriness, random = ~ Blurriness | Random, data = df)
m[[2]] <- lme(dv ~ Blurriness_1_0, random = ~ Blurriness_1_0 | Random, data = df)
m[[3]] <- lme(dv ~ Blurriness_0_1, random = ~ Blurriness_0_1 | Random, data = df)

models <- lapply(m, function(x) summary(x)$tTable)

This gives 3 models which hopefully show the behaviour you describe:

models
#> [[1]]
#>                 Value Std.Error DF    t-value     p-value
#> (Intercept) -0.138797 0.2864303 37 -0.4845752 0.630834098
#> BlurrinessN  1.008572 0.3451909 37  2.9217817 0.005901891
#> 
#> [[2]]
#>                     Value Std.Error DF   t-value     p-value
#> (Intercept)     0.8697753 0.2864293 37  3.036614 0.004366652
#> Blurriness_1_0 -1.0085723 0.3451909 37 -2.921781 0.005901898
#> 
#> [[3]]
#>                    Value Std.Error DF    t-value     p-value
#> (Intercept)    -0.138797 0.2864303 37 -0.4845752 0.630834098
#> Blurriness_0_1  1.008572 0.3451909 37  2.9217817 0.005901891

In this example, the p values are different only for the intercepts, which is what you would expect (it just tells us that the means of the two fixed-effects groups sit at different numbers of standard deviations from 0).

Perhaps this is not what you meant though - it's difficult to tell from your question without a reproducible example.

Categorical variable coding with 0s and 1s giving different outputs in R

1 Answers1