0

I have a Stata dataset (.dta file) that contains one variable, RGA (this is a MWE, I actually have tons of variables). This variable takes 3 factor values: 1, 2 and 3. These factors refer to meaningful things (so-called "value labels") and the association between the factors and their value labels is in a separate .txt Stata-like file, fully reproduced here:

    . label define RGA_l
        1 "meaning of 1"
        2 "meaning of 2"
        3 "meaning of 3"

    . label values RGA RGA_l

I load my .dta file into R through the haven package. I would like to have an easy access to the value labels of RGA within R, notably to be able to quickly match RGA's values with their value label to produce readable output. How can I read this separate .txt file into R in a way that I can match it with my dataset?

Ben
  • 429
  • 4
  • 11

1 Answers1

0

I don't know exactly what type of column haven imported (try using str() on your dataframe) but here is how you create factors in R. The factor function is somewhat confusing because factors don't actually have labels per se, they only have levels, but the argument is still called labels.

set.seed(100)
df <- data.frame(RGA_1 = sample.int(3, 10, replace = TRUE))

df$RGA_1 <- factor(df$RGA_1, labels = c("meaning1", "meaning2", "meaning3"))
df
#>       RGA_1
#> 1  meaning1
#> 2  meaning1
#> 3  meaning2
#> 4  meaning1
#> 5  meaning2
#> 6  meaning2
#> 7  meaning3
#> 8  meaning2
#> 9  meaning2
#> 10 meaning1

Created on 2018-05-30 by the reprex package (v0.2.0).

Calum You
  • 14,687
  • 4
  • 23
  • 42
  • Thanks! However my question is about automating the value label import from the .txt file. Manually setting them as you proposed is impractical, since I have 500+ variables I want to label the values of. This .txt file is definitely made of Stata commands, so I'm hoping that somebody wrote a parser in R to read it. – Ben May 30 '18 at 23:22
  • Then it would be good to edit the question to reflect that, I think. See the [haven documentation](http://haven.tidyverse.org/articles/semantics.html), which seems to indicate that it creates `labelled` class vectors that can be converted to factors with `as_factor` (assuming the columns in the stata file are labelled, i.e you ran that do file already). – Calum You May 30 '18 at 23:33