2

I have a data set with labelled data and would like to create a new column containing only the label as character.

Consider the following example:

value_labels <- tibble(value = 1:6, labels = paste0("value", 1:6))
df_data <- tibble(id = 1:10, var = floor(runif(10, 1, 6)))
df_data <- df_data %>% mutate(var = haven::labelled(var, labels = deframe(value_labels[2:1])))

This yields:

# A tibble: 10 x 2
      id        var
   <int>  <dbl+lbl>
 1     1 2 [value2]
 2     2 2 [value2]
 3     3 4 [value4]
 4     4 2 [value2]
 5     5 4 [value4]
 6     6 3 [value3]
 7     7 5 [value5]
 8     8 4 [value4]
 9     9 3 [value3]
10    10 1 [value1]

I would now like to create an additional column labs containing only the labels (i.e. value2 in rows 1 & 2, value4 in row 3 etc.

I tried using val_labs() (df_data %>% mutate(labs = val_labels(df_data$var, var))) unsuccessfully. Can someone point out the right way to do this?

Ivo
  • 3,890
  • 5
  • 22
  • 53

3 Answers3

6

haven::as_factor() is used for this. See the examples of the help page for labelled vectors.

df_data %>%
  mutate(labs = as_factor(var))

# A tibble: 10 × 3
      id        var  labs 
   <int>  <dbl+lbl> <fct> 
 1     1 2 [value2] value2
 2     2 5 [value5] value5
 3     3 2 [value2] value2
 4     4 5 [value5] value5
 5     5 2 [value2] value2
 6     6 4 [value4] value4
 7     7 5 [value5] value5
 8     8 4 [value4] value4
 9     9 5 [value5] value5
10    10 3 [value3] value3
Darren Tsai
  • 32,117
  • 5
  • 21
  • 51
  • Great. Why does `unnest` does not work here? – TarJae Apr 20 '22 at 15:42
  • 1
    @TarJae `tidyr::unnest` works on the list-column of a data frame. A labelled vector supported by `haven` is actually a vector covered by text labels, so `unnest` doesn't make sense here. – Darren Tsai Apr 20 '22 at 15:58
3

We can use get_labels

library(dplyr)
library(sjlabelled)
df_data %>% 
   mutate(labs = get_labels(var)[var])

-output

# A tibble: 10 × 3
      id        var labs  
   <int>  <dbl+lbl> <chr> 
 1     1 3 [value3] value3
 2     2 3 [value3] value3
 3     3 2 [value2] value2
 4     4 4 [value4] value4
 5     5 5 [value5] value5
 6     6 3 [value3] value3
 7     7 3 [value3] value3
 8     8 4 [value4] value4
 9     9 1 [value1] value1
10    10 2 [value2] value2
akrun
  • 874,273
  • 37
  • 540
  • 662
0

I used the labelled package for the same purpose.

library(dplyr)
library(labelled)

df_data %>% 
   mutate(labs = to_factor(var))

Output:

# A tibble: 10 × 3
      id        var labs  
   <int>  <dbl+lbl> <fct> 
 1     1 5 [value5] value5
 2     2 4 [value4] value4
 3     3 5 [value5] value5
 4     4 5 [value5] value5
 5     5 2 [value2] value2
 6     6 5 [value5] value5
 7     7 2 [value2] value2
 8     8 5 [value5] value5
 9     9 5 [value5] value5
10    10 1 [value1] value1

Found it useful to convert the whole df as well:

df_factors <- to_factor(df_data)
df_factors

Output:

# A tibble: 10 × 2
      id var   
   <int> <fct> 
 1     1 value5
 2     2 value4
 3     3 value5
 4     4 value5
 5     5 value2
 6     6 value5
 7     7 value2
 8     8 value5
 9     9 value5
10    10 value1
Aashish KC
  • 48
  • 6