3

I am having an issue with displaying the correct grouping of a factor variable after using MICE. I believe this is an R thing, but I included it with mice just to be sure.

So, I run my mice algorithm, here is a snipit of how I call I format it in the mice algorithm. Note, I want it to be 0 for no drug, and 1 for yes drug, so I coerce it to be a factor with levels 0 and 1 before I run it

mydat$drug=factor(mydat$drug,levels=c(0,1),labels=c(0,1))

I then run mice and it runs logistic regression (this is the default) on drug, along with my other variables to be imputed.

I can extract the results of one of the imputations when it is complete by

drug=complete(imp,1)$drug

We can view it

> head(drug)
[1] 0 0 1 0 1 1
attr(,"contrasts")
  2
0 0
1 1
Levels: 0 1

So the data is certainly 0,1.

However, when I do something with it, like cbind, it changes to 1's and 2's

> head(cbind(drug))
 drug
[1,]    1
[2,]    1
[3,]    2
[4,]    1
[5,]    2
[6,]    2

Even when I coerce it to a numeric

> head(as.numeric(drug))
[1] 1 1 2 1 2 2

I want to say it has something to do with the contrasts, but when I delete the contrast by doing

attr(drug,"contrasts")=NULL

It still shows up with 1's and 2's when called and printed by others.

I am able to get it to print correctly by using I()

> head(I(drug))
[1] 0 0 1 0 1 1
Levels: 0 1

So, I believe that this is an R issue, but I don't know how to remedy it. Is using I() the correct solution, or is it just a workaround that happens to work here? What is actually happening behind the scenes that is making the output display as 1's and 2's?

Thanks

RayVelcoro
  • 524
  • 6
  • 21
  • `cbind` is returning a matrix, which doesn't store factors (it will only store character strings and numerics). In the conversion to a matrix, your factors are being represented by their numerical coding, not by the character label. All factors are stored as integers where the first level is 1, and the subsequent levels are appropriately numbered. Your best option for remedying it is to avoid storing factors into matrices. – Benjamin Aug 12 '15 at 16:50
  • Actually, head(cbind(I(drug))) still yields 1's and 2's, so that must not be the correct solution. – RayVelcoro Aug 12 '15 at 16:50
  • But you're still using `cbind`, which is desperately trying to return a matrix. And you can't store factors in a matrix. You should consider solutions that don't involve `cbind`. – Benjamin Aug 12 '15 at 16:54
  • @Benjamin I see what you mean So, this seems to work.... as.numeric(cbind(as.character(drug))) But this is a little bulky. Is there a more streamlined way to do this? – RayVelcoro Aug 12 '15 at 16:54
  • I'm a little hesitant to advise you about how to do what you're doing because I'm unsure of what you are intending to do afterward. – Benjamin Aug 12 '15 at 17:00
  • Right now, all I want to do is display it as it is. I think that later I will want to perhaps break down response based off of drug. The reason I want it displayed as 0/1 now is because that is how it is originally coded, and it would be very confusing if 1 could both be drug and no drug. Thus, I want to standardize it so 0 is always no drug and 1 is always drug. – RayVelcoro Aug 12 '15 at 17:04
  • Since drug is already coded as 0/1, you might consider _not_ converting to a factor in the first place. – A. Webb Aug 12 '15 at 17:38
  • MICE by default treats a vector of 0/1 as a numeric, and thus it will default to doing pmm on those. I could leave it and then change each method by hand to logreg Thanks for the suggestion! – RayVelcoro Aug 12 '15 at 17:41

3 Answers3

2

The 0s and 1s are the names of your levels. The underlying integer corresponding to the names is 1 and 2. You can see with str,

str(drug)
# Factor w/ 2 levels "0","1": 2 2 2 2 2 2 1 1 2 2

When you coerce the factor to numeric, you drop the names and get the integer representation.

Rorschach
  • 31,301
  • 5
  • 78
  • 129
2

Factors start with the first level being represented internally by 1.

Your two options:

1) Adjust for 1-based index of levels:

as.numeric(drug) - 1

2) Take the labels of the factors and convert to numeric:

as.numeric(as.character(drug))

Some people will point you in the direction of the faster option that does the same thing:

as.numeric(levels(drug))[drug]

I'd also consider using logical values instead of factor in the first place.

mydat$drug = as.logical(mydat$drug) 
Señor O
  • 17,049
  • 2
  • 45
  • 47
0

This is how R encodes factors. The underlying numeric representation of the factors always starts with 1. As you can see with the following to examples:

as.numeric(factor(c(0,1)))
as.numeric(factor(c(A,B)))

Not sure about the specifics about how MICE works, but if it requires a factor instead of a simple 0/1 numeric variable to use logistic regression, you can always hack the results with something like the following:

as.numeric(as.character(factor(c(0,1)))) 

or in your specific case

drug <- as.numeric(as.character(drug))