2

I have a long list of very large SAS files. I want to import them using read_sas. To increase speed and reduce memory usage I want to only import the columns I am interested in using cols_only.

The trouble is, I have a long list of possible column names - but not every column is in my dataset. If I pass the full list to cols_only, I get the error:

Evaluation error: Column 2 must be named.

Is there a way to suppress this error, and encourage read_sas to do its best to import whatever variables it can from the list I have passed?

  • 2
    Update `haven` and use `col_select()`. You can use the helpers available in `dplyr::select` to have more flexible selections. I think they are called tidyhelpers but I am on my phone so that will need double-checking. – Andrew Nov 23 '19 at 21:29

1 Answers1

4

As @Andrew mentions in their comment, with haven >= 2.2.0 you can use the new col_select argument for this. To select columns that may not exist, use the helper one_of():

library(haven)
library(tidyselect)

f <- tempfile()
write_sas(mtcars, f)

my_cols <- c("mpg", "i-don't-exist")
read_sas(f, col_select = one_of(my_cols))
#> Warning: Unknown columns: `i-don't-exist`
#> # A tibble: 32 x 1
#>      mpg
#>    <dbl>
#>  1  21  
#>  2  21  
#>  3  22.8
#>  4  21.4
#>  5  18.7
#>  6  18.1
#>  7  14.3
#>  8  24.4
#>  9  22.8
#> 10  19.2
#> # ... with 22 more rows
Mikko Marttila
  • 10,972
  • 18
  • 31
  • 1
    I marked this as complete because this is a fantastic solution. Unfortunately due to a table of dependencies i want to use rio's import() function to do this. When I use import() I get the following error "No tidyselect variables were registered" - is there any way around this problem? – Jonathan Nolan Nov 24 '19 at 08:19
  • 1
    @JonathanNolan unfortunately there's no way around that: `col_select` needs to be evaluated in a specific context, but `rio::import()` evaluates all of it's input arguments by putting them in a list. This is something that would need to be fixed in the rio package. – Mikko Marttila Nov 24 '19 at 09:46
  • 2
    That said, I think this problem is already on their radar, given this recently opened issue: https://github.com/leeper/rio/issues/248 – Mikko Marttila Nov 24 '19 at 09:52
  • 1
    Fantastic find! I'll be watching the issue unfold with interest. – Jonathan Nolan Nov 25 '19 at 04:30