4

I am trying to convert a Tibble to a parameter list for a function call. The reason I am doing this is because I want to create a simple file specification Tibble for reading in multiple fixed width files with varying columns. This way I only need to specify what columns are in a file using pull and select and then I can automatically have the file loaded and parsed. However, I am running into problems using the cols object to specify column formats.

For this example lets assume I have a Tibble of the format:

> (filespec <- tibble(ID = c("Title", "Date", "ATTR"), Length = c(23, 8, 6), Type = c("col_character()", "col_date()", "col_factor(levels=c(123456,654321)")))
# A tibble: 3 x 3
     ID Length                               Type
  <chr>  <dbl>                              <chr>
1 Title     23                    col_character()
2  Date      8                         col_date()
3  ATTR      6 col_factor(levels=c(123456,654321)

I want to end up with a cols object of the format:

> (cols(Title = col_character(), Date = col_date(), ATTR=col_factor(levels=c(123456,654321))))
cols(
  Title = col_character(),
  Date = col_date(format = ""),
  ATTR = col_factor(levels = c(123456, 654321), ordered = FALSE)
)

From other questions I have read I know this can be done with do.call. But I can not figure out how to convert the columns ID and Type to a cols object in an automated manner. Here is an example of what I tried...

> do.call(cols, select(filespec,ID, Type))
Error in switch(x, `_` = , `-` = col_skip(), `?` = col_guess(), c = col_character(),  : 
  EXPR must be a length 1 vector

I am assuming the select needs to be wrapped with another function that performs the row to parameter mapping, how is this done?

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • 1
    You *might* be able to do this with `do.call` but your code, as it stands, doesn’t remotely do what you want — you need to first understand what `do.call` actually does before you can use it. – Konrad Rudolph Sep 06 '17 at 16:48
  • I am new to R so this is all a learning experience. I think I understand what do.call does, it calls a function with the other parameters as the arguments. As per my comment on the answer below I think what is escaping me here is how to create a named list in an automated fashion. I don't want to have to type out all of the field=type parameters by hand, I have them in two columns, I just want R to create the named list for me. – RandomString Sep 06 '17 at 17:35
  • Yes, you’re actually spot-on in your problem description. From your question it didn’t seem like you understood this. But this part of the problem is actually easily solvable using `setNames`. Another, bigger problem is that your parameters are strings, not code. As such, you’d first need to evaluate them and while this is possible (via parse/eval), it’s messy and probably not a good idea to begin with (well; it might be, in your case). Joran’s approach is superior. – Konrad Rudolph Sep 07 '17 at 11:47

2 Answers2

1

I might approach this a little differently, and store the file specs in a simple list:

library(purrr)
library(readr)
filespec <- list(Title = list(Length = 23,
                              Type = col_character()),
                 Date = list(Length = 8,
                             Type = col_date()),
                 ATTR = list(Length = 6,
                             Type = col_factor(levels = 123456,654321)))

a <- at_depth(.x = filespec,.depth = 1,.f = "Type")
> invoke(.f = cols,.x = a)

cols(
  Title = col_character(),
  Date = col_date(format = ""),
  ATTR = col_factor(levels = 123456, ordered = 654321, include_na = FALSE)
)

or,

> invoke(.f = cols,.x = a[c('Title','ATTR')])
cols(
  Title = col_character(),
  ATTR = col_factor(levels = 123456, ordered = 654321, include_na = FALSE)
)
joran
  • 169,992
  • 32
  • 429
  • 468
  • I like this solution and it works! The primary reason I was using a tibble is because in the end I might have 50-60 columns and maintaining that list in a source file could be annoying, so I was hoping to do it in a read in via a csv. Is there an easy way to take the two columns I need out of the tibble and turn them into a list? I am new to R and I think the method to create a named list in an automated fashion is escaping me. – RandomString Sep 06 '17 at 17:28
1

tl;dr: There are many things that make this more complex than it seems. But it’s feasible, and the resulting code (provided at the end) isn’t complicated, once the individual parts are understood.

As discussed in the comments, I fundamentally prefer Joran’s approach. In fact, whenever you find yourself storing code expressions in character strings, this should set off alarm bells: it’s an anti-pattern known as stringly typed code (a riff on, and quite the opposite of, strongly typed code). Unfortunately R is quite full of stringly typed code.

That said, your use-case (file-based configuration) is in itself a good idea. I would consider storing the information in a different format than R code fragments. But, well, it does work. So let’s explore why your code doesn’t work.

The first problem is this: you pass a tibble to do.call. Tibbles are lists of columns, so do.call allows this. However, internally your call is transformed to something equivalent to:

cols(
    ID = c("Title", "Date", "ATTR"),
    Type = c("col_character()", "col_date()", "col_factor(levels=c(123456,654321))")
)

— But this isn’t the code we want at all!

We need to fix two things here:

  1. We need to use the Type column as argument values, and the ID column as argument names. We can do this by creating a new list that has ID as names and Type as values: setNames(Type, ID).

  2. cols does not know what to do with character string arguments. It needs column specifications — objects of type Collector.

    Put differently, it’s a huge difference whether you write "col_date()" or col_date().

To fix this, we need to do something fairly complex: we nee to parse the Type column as R code, and we need to evaluate the resulting parsed expressions. R provides two handy functions (parse and eval, respectively) to accomplish this. But don’t let the existence of two easy functions fool you: it’s an incredibly complex operation. R essentially needs to run a full parser and interpreter on your code fragments. And it gets even hairier if the code isn’t what you expect. For instance, the text might contain the code unlink('/', recursive = TRUE) instead of col_date(). R would then happily erase your hard drive.

This is just one of the reasons why parse/eval is complex and generally avoided. Other reasons include: what happens if there’s a parse error in the code (in fact, your code does contain a missing closing parenthesis …)?

But here we go. Now that we have all the pieces together, we can join them relatively easily:

filespec %>%
    mutate(Parsed = lapply(Type, function (x) parse(text = x, encoding = 'UTF-8'))) %>%
    mutate(ColSpec = lapply(Parsed, eval)) %>%
    with(setNames(ColSpec, ID)) %>%
    do.call(cols, .)

Execute this code piece by piece to see what it does and convince yourself that it’s working correctly.

Community
  • 1
  • 1
Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • 1
    The setNames / with part is exactly what I needed. I knew about the eval problem but was going to fix it at a later date, most likely with a simple mapping from type strings -> S3 objects. – RandomString Sep 07 '17 at 13:41