5

I have a two levels factor in my data that I want to convert to logical

a <- str(df$y)
a
Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...

I use as.logical(df$y) to convert them into logical. However, the data turn into NA

summary(a)

      Mode    NA's 
    logical  500000

At which point do I fail to convert the data?

JRG
  • 4,037
  • 3
  • 23
  • 34
user3292755
  • 383
  • 2
  • 9
  • 25
  • 2
    Since you have a factor variable, I suggest you convert to character, then to integer and finally to logical: `as.logical(as.integer(levels(a))[a])`. You can read about it in the "Value" section of `?as.logical` – talat Jul 26 '17 at 14:53
  • 1
    There is a short cut: `y <- factor(0:1); str(y); as.logical(as.integer(y) - 1L)` returns `[1] FALSE TRUE` – Uwe Jul 26 '17 at 15:03
  • 3
    Just test for equality with whatever level you want to be "true": `df$y == "1"`. The result will be your logical vector. – Gregor Thomas Jul 26 '17 at 15:38

5 Answers5

7

At which point do I fail to convert the data?

I'd argue that you at no point fail to convert the data, it's the function that is a bit odd and fails to understand the nature of your data.

If you read ?as.logical you'll see that when input is factor the levels (which are character) are used in the conversion. The only valid character strings are all variations of "true" and "false", everything else, including "0" and "1", returns NA. 0 and 1 are however interpreted as FALSE and TRUE, respectively, when they are given as numeric, hence all the following works:

y <- factor(c(0, 1, 1, 0))

as.logical(as.integer(levels(y)[y]))
as.logical(as.integer(y) - 1L)
as.logical(as.integer(as.character(y)))

A bit cumbersome, I know, but that's how it is.

AkselA
  • 8,153
  • 2
  • 21
  • 34
5

Indeed, there is a strightforward method.

As you have 2 levels factor, identify whats true and false

df <- data.frame(y=factor(sample(c("0","1"),10,replace = TRUE)))

str(df$y)
#  Factor w/ 2 levels "0","1": 2 2 2 1 1 2 2 2 2 2

levels(df$y) <- c(FALSE,TRUE)
df$y <- as.logical(df$y)

str(df$y)
# logi [1:10] TRUE TRUE TRUE FALSE FALSE TRUE ...
1

This is probably a little too late to be helpful, but I ran into a similar problem and found a fix:

as.logical(as.integer(data.frame$column))

should do the trick.

  • Should probably note that this works *only if* the factor representation for `TRUE` is 1 and `FALSE` is 0. Any other value for the factors gets coerced into `TRUE` – Giulio Centorame Jun 02 '23 at 05:12
0

You can use == to create TRUE and FALSE values:

y = factor(c(0, 1, NA))
y == "1"
# [1] FALSE  TRUE    NA
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
0

You can use the == operator in as.logical() to identify which factor value is TRUE.

as.logical(factor_vector == 'True value here')