2

Applying labels is an important part of making survey data comprehensible when reported

So the best example I can find uses expss::apply_labels() e.g the famous mtcars example https://cran.r-project.org/web/packages/expss/vignettes/tables-with-labels.html

as input this requires a data.table and a list of comma separated assignment pairs e.g

apply_labels(dt, col1 = "label1", col2 = "label2", col3 = "label3")

This is fine if you have one data file and a few columns and you can be bothered typing them in for each time, but its not very helpful if you have lots of data files. So how could one load a csv metadata file in format:

Col1 Col2 Col3

Label1 Label2 Label3

where the Col names match the same names in the data table

this means effectively translating the metadata csv file so that it generates

coln = "labeln"

for each column.

So far I have found the biggest problem is that the apply labels column names are objects not strings and it is very difficult to translate a string to the object in the right scope.

This is where I've got to

    library(expss)
    library(data.table)
    library(glue)

    readcsvdata <- function(dfile)
     {
        rdata <- fread(file = dfile, sep = "," , quote = "\"" , header = TRUE, 
        stringsAsFactors = FALSE, na.strings = getOption("datatable.na.strings","NA"))
        return(rdata)
        }

    rawdatafilename <- "testdata.csv"
    rawmetadata <- "metadata.csv"

    mdt <- readcsvdata(rawmetadata)
    rdt <-readcsvdata(rawdatafilename)
    commonnames <- intersect(names(mdt),names(rdt))  # find common 
    qlabels <- as.character(mdt[1, commonnames, with = FALSE])

    comslist <- list()
    for (i in 1:length(commonnames)) # loop through commonnames and qlabels
          {  
          if (i == length(commonnames))
              {x <- glue('{commonnames[i]} = "{qlabels[i]}"')} # no comma for final item
              else 
              {x <- glue('{commonnames[i]} = "{qlabels[i]}",')} # comma for next item

          comslist[[i]] <- x
    }

comstring <- paste(unlist(comslist), collapse = '')

tdt = apply_labels(tdt, eval(parse(text = comstring)))

which yields

Error in parse(text = comstring) : :1:24: unexpected ',' 1: varone = "Label1", ^

oh and print(comstring) produces:

[1] "varone = \"Question one\",vartwo = \"Question two\",varthree = \"Question three\",varfour = \"Question four\",varfive = \"Question five\",varsix = \"Question six\",varseven = \"Question seven\",vareight = \"Question eight\",varnine = \"Question nine\",varten = \"Question ten\""

Peter King
  • 91
  • 8
  • 1
    If that's truly a CSV file, and you read that in with `read.csv` (or `fread` or whatever), then `do.call(apply_labels, c(list(dt), csvdat))` should work. – r2evans May 27 '20 at 04:02
  • You can use `var_lab` in a loop: `for(each in colnames(metadata)) var_lab(dt[[each]]) = metadata[[each]]` – Gregory Demin May 27 '20 at 10:22

2 Answers2

1

I don't have expss handy, but I think this is generically about how to programmatically assign function arguments in R.

If you start with a CSV file that contains the three pairings you need,

csvdat <- read.csv(stringsAsFactors=FALSE, text="
col1,col2,col3
label1,label2,label3")

I'll write a fake function (since I don't have expss, and it's not critical) that takes a first argument and zero or more follow-on arguments dynamically.

my_fake_labels <- function(x, ...) {
  dots <- list(...)
  message("x labels   : ", paste(sQuote(colnames(x)), collapse = ", "))
  message("other names: ", paste(sQuote(names(dots)), collapse = ", "))
}
origDT <- data.table(aa=1, bb=2)

my_fake_labels(origDT, col1="label1", col2="label2", col3="label3")
# x labels   : 'aa', 'bb'
# other names: 'col1', 'col2', 'col3'

It's that manual argument-setting that you're trying to avoid. (I know I'm not doing any label-setting here, let's ignore that for now.)

The programmatic way of doing this, using origDT as the first argument, and the elements of csvdat as the second and subsequent arguments:

do.call(my_fake_labels, c(list(origDT), csvdat))
# x labels   : 'aa', 'bb'
# other names: 'col1', 'col2', 'col3'

The second argument to do.call needs to be a list, optionally named. Since a data.frame (and therefore a data.table) is just a glorified named list, this fits the bill. What this does is take each element of the list and apply it as the corresponding arguments of the function (the first argument of do.call).

The list(origDT) is because normally the c(...) function would concatenate the columns/elements of the two lists. If we did just c(origDT, csvdat), then the function would be called with ncol(origDT) + ncol(csvdat) arguments, instead of the desired 1 + ncol(csvdat). For this, c(list(origDT), ...) makes sure that the whole origDT is the function's first argument.

(It might also be easy to form the csvdat programmatically instead of requiring an external file, but I'm guessing that you have a reason to do it via CSV.)

r2evans
  • 141,215
  • 6
  • 77
  • 149
  • This may be very sophisticated but I'm afraid I simply don't understand it. I don't see what your function my_fake_labels is for. Is it a proxy for expss apply_labels for the sake of argument? What is list(...)? Please forgive a struggling beginner. – Peter King May 28 '20 at 04:56
  • *"I'll write a fake function (since I don't have expss)"*. Add to that *"this fake function takes the same arguments as your `apply_labels` so behaves similarly as far as we need it to here"*. Just replace it with your `expss::apply_labels` and see what happens. – r2evans May 28 '20 at 05:06
  • `list(...)` is R's way of (re)packaging up an arbitrary (0 or more) length of arguments. – r2evans May 28 '20 at 05:07
  • Tried do.call adding the first parameter (name of the datatable) tdt <- copy(rdt) comslist <- prepend(comslist,"tdt") # using purrr tdt <- do.call(expss::apply_labels,comslist) result was Error in UseMethod("apply_labels") : no applicable method for 'apply_labels' applied to an object of class "character" – Peter King May 29 '20 at 23:22
  • I think that `purrr::prepend` is stripping the class from your `comslist`, so `expss::apply_labels` does not know what to do with it. I don't know why you are pre-pending a literal string `"tdt"` to the list, though, that seems odd. Can't you just do `do.call(apply_labels, c(tdt, comslist))`? – r2evans May 29 '20 at 23:43
1

apply_labels is not very convenient for assignment labels from external dictionary. You can use var_lab instead:

library(expss)
library(data.table)

readcsvdata <- function(dfile)
{
    rdata <- fread(file = dfile, sep = "," , quote = "\"" , header = TRUE, 
                   stringsAsFactors = FALSE, na.strings = getOption("datatable.na.strings","NA"))
    return(rdata)
}

rawdatafilename <- "testdata.csv"
rawmetadata <- "metadata.csv"

mdt <- readcsvdata(rawmetadata)
rdt <-readcsvdata(rawdatafilename)
commonnames <- intersect(names(mdt),names(rdt))  # find common 
qlabels <- as.list(mdt[1, commonnames, with = FALSE])


for (each_name in commonnames) # loop through commonnames and qlabels
{  
    var_lab(rdt[[each_name]]) <- qlabels[[each_name]]
}

There is a similar val_lab function for value labels. Additionally you may be interested in apply_dictionary and create_dictionary functions. To get help about them type ?apply_dictionary in the console.

Gregory Demin
  • 4,596
  • 2
  • 20
  • 20
  • 1
    Thanks very much for that. Small point however var_lab(rdt[[each_name]]) = qlabels[[each_name]] doesn't work var_lab(rdt[[each_name]]) <- qlabels[[each_name]] (as per manual) does. if you would like to edit. Cheers. – Peter King Jun 08 '20 at 01:05
  • @PeterKing Thanks for reporting. I edited the answer. But really it is very strange - it should have no difference in this context. – Gregory Demin Jun 08 '20 at 09:45