3

I have a tibble, df, with a factor, A, I wish to:

1) copy of,C, and
2) recode based on a second variable, B.

At the moment I'm doing it in this roundabout way. I'm quite confused by the conditional recoding of factor. I also looked at dplyr's recode, but couldn't work out a smarter method.

library(tibble)
df  <- tibble(
  A = factor(c(NA, "b", "c")), 
  B = c(1,NA,3)
)

My initial tibble

df
#> # A tibble: 3 x 2
#>        A     B
#>   <fctr> <dbl>
#> 1   <NA>     1
#> 2      b    NA
#> 3      c     3

Step #1 in my current solution

df$C <- with(df, ifelse(is.na(B), 'B is NA', A)) 
df
#> # A tibble: 3 x 3
#>        A     B       C
#>   <fctr> <dbl>   <chr>
#> 1   <NA>     1    <NA>
#> 2      b    NA B is NA
#> 3      c     3       2

Step #2 in my current solution

df$C <- dplyr::recode_factor(df$C, '2' = 'c')
df
#> # A tibble: 3 x 3
#>        A     B       C
#>   <fctr> <dbl>  <fctr>
#> 1   <NA>     1    <NA>
#> 2      b    NA B is NA
#> 3      c     3       c

How am I suppose to do this?

zx8754
  • 52,746
  • 12
  • 114
  • 209
Eric Fail
  • 8,191
  • 8
  • 72
  • 128

2 Answers2

5

Using dplyr::if_else, convert factor to character, then convert to factor again:

library(dplyr)

df %>% 
  mutate(C = factor(if_else(is.na(B), "B is NA", as.character(A))))

# # A tibble: 3 x 3
#          A     B       C
#     <fctr> <dbl>  <fctr>
#   1   <NA>     1    <NA>
#   2      b    NA B is NA
#   3      c     3       c
zx8754
  • 52,746
  • 12
  • 114
  • 209
  • 1
    Thanks. It answers my question, I'm however still surprised by what to me seems as first going to one pace and then going back, i.e. _convert factor to character, then convert to factor again_. – Eric Fail Oct 24 '17 at 13:22
  • @EricFail if we didn't need to say "B is NA" we could just do: `df$C <- df$A; df$C <- ifelse(is.na(df$B), NA, df$C)` – zx8754 Oct 24 '17 at 13:25
1

The conversion is happening in ifelse. From the docs:

Value

A vector of the same length and attributes (including dimensions and "class") as test and data values from the values of yes or no. The mode of the answer will be coerced from logical to accommodate first any values taken from yes and then any values taken from no.

Because yes is "B is NA", which is a character vector, the output is a character vector. That the values from A are converted to integer and then converted to character is a weird implementation outcome. Factors are really integer vectors with modified class and levels attributes.

You could also achieve this by copying A, adding "B is NA" to the acceptable levels, and then replacing a subset.

df$C <- df$A
levels(df$C) <- c(levels(df$C), "B is NA")
df$C[is.na(df$B)] <- "B is NA"
df
# # A tibble: 3 x 3
#        A     B       C
#   <fctr> <dbl>  <fctr>
# 1   <NA>     1    <NA>
# 2      b    NA B is NA
# 3      c     3       c

Note that if you don't add "B is NA" to the levels, all the replaced values will be NA with a warning. Factors are restricted to only take specific values. If you want to add a new one, you have to explicitly do so.

Nathan Werth
  • 5,093
  • 18
  • 25