1

I have a data frame with lots of columns, and I want to edit columns' attributes, so that each column attribute will be based of a matched value from a lookup table.

Dummy data

df <- data.frame(id = c(1,2,3), age = c(10, 30, 55), eye_color = c("blue", "brown", "green"))

> df

#   id age eye_color
# 1  1  10      blue
# 2  2  30     brown
# 3  3  55     green

If I just wanted to change the attribute of a single column df$id, I would do:

attr(df$id, "label") <- "Person's ID"

> attr(df$id, "label")
# [1] "Person's ID"

However, I need to edit the "label" attributes of all columns, and I want to be more efficient. So my hope is to rely on a separate table that matches column names and "label" attributes. (In reality, I'd import a CSV file created manually outside of R, but for the sake of post reproducability, here's a dummy table that conveys my point. It could have been a data frame, as the object type doesn't matter):

label_dictionary <-
  matrix(
    c(
      "id",
      "Person's ID",
      "age",
      "Person's age when taking the survey",
      "eye_color",
      "Person's eye color"
    ),
    ncol = 2,
    byrow = TRUE
  )
colnames(label_dictionary) <- c("variable", "label")
label_dictionary <- as.table(label_dictionary)

> label_dictionary

#   variable  label                              
# A id        Person's ID                        
# B age       Person's age when taking the survey
# C eye_color Person's eye color     

My question

How could I edit the "label" attributes of all columns in my dataframe df, based on matching values from the label_dictionary table? (Assuming the order of values in label_dictionary$variable doesn't necessarily match the order of colnames(df)).

Thanks!

Emman
  • 3,695
  • 2
  • 20
  • 44

1 Answers1

3

Instead of as.table, I suggest to use setNames:

label_dictionary <- read.csv("myfile.csv", stringsAsFactors=FALSE)
label_dictionary <- setNames(label_dictionary[,"label"], label_dictionary[,"variable"])

Basically, you are converting the matrix or data.frame object into a named vector. Then you use this vector to set the attribute by a simple lookup:

for (x in colnames(df)) attr(df[,x], "label") <- label_dictionary[x]

The loop iterates over all column names and sets the attribute.

Karsten W.
  • 17,826
  • 11
  • 69
  • 103
  • Thanks, but I'm not sure I understand how to implement your answer, neither the setNames approach nor the for loop. As for the `as.table`, it's really not critical, and I used it only as a demonstration. In reality, my "label_dictionary" file is a long CSV file (1 column for variable name, and another for attribute values), that needs to get imported to R. – Emman Sep 16 '19 at 11:49
  • Edited the answer to clarify. – Karsten W. Sep 16 '19 at 11:54
  • Thanks. I had to modify your for loop so it addresses the intention to edit the label attribute specifically. So my modified for loop is: `for (x in colnames(df)) attr(df[,x], "label") <- label_dictionary[x]`. **However, the output is somewhat strange.** While standard attribute setup (e.g., `attr(df$id, "label") <- "Person's ID")` gives `df$id` = `[1] 1 2 3 attr(,"label") [1] "Person's ID"`, the output after applying the for loop gives `df$id` = `[1] 1 2 3 attr(,"label") id Person's ID Levels: Person's age when taking the survey Person's eye color Person's ID`. Any idea how to fix this? – Emman Sep 16 '19 at 13:12
  • 1
    Seems that you are assigning a factor. Try to convert to character, for instance, using `stringsAsFactors=FALSE` in `read.csv`. Missed the "label" argument to `attr`, sorry. – Karsten W. Sep 16 '19 at 13:26
  • Unfortunately, for some reason **I can't scale this solution up to my real dataset.** After running the loop, I end up getting `NULL` when executing `attr(dataset$col, "label")`. I'm unsure what's wrong and not being generalized from our working example in this post to my larger dataset. Any ideas how to inquire and test? It would otherwise be problematic to upload the entire dataset here. Thanks! – Emman Sep 16 '19 at 16:31
  • Hmm. Did you look at `label_dictionary` and does it look ok? Does setting `attr` work when entering the label manually? – Karsten W. Sep 18 '19 at 19:41
  • I figured it out. My data object was a `tibble`, and not a `data.frame`, which is [apparently documented](https://stackoverflow.com/questions/48157225/how-to-add-metadata-to-a-tibble) that tibbles don't accept attributes well. Converting to a `data.frame` solved the issue. However, can you think of a way to implement your solution within a `magritrr` pipe? Because currently, the solution relies on the assignment operator (`<-`), which breaks the pipe. I hope it's not considered an off-topic. Thanks! – Emman Sep 25 '19 at 15:53
  • This doesn't work whether it's a tibble or not because there is an important difference between df$var and df[ ,var] as far as attr is concerned. If I use attr(df$var", "label") I get the same correct results as attributes(df) would suggest. But if I use attr(df[ ,var], "label") I get NULL. So I conclude df[ ,var] references the list elements not the list itself, and hence returns no list attributes. – Peter King Oct 27 '22 at 23:07