How to define column specification for similarly named column with readr?

Question

I have a data base with 250 columns and want to read only 50 of them instead of loading all of them then dropping columns with dplyr::select. I suppose I can do that using a column specification. I don't want to type the column specification manually for all those columns.

The 50 columns I want to keep have a common prefix, say 'blop', so I managed to manually change the column specification object I got from readr::spec_csv. I then used it to read my data file :

short_colspec <- readr::spec_csv('myfile.csv')
short_colspec$cols <- lapply(names(short_colspec$cols), function(name){
    if (substr(name, 1, 4) == 'blop'){
        return(col_character())
    } else {
        return(col_skip())
    }
})
short_data <- read_csv('myfile.csv', col_types = short_colspec)

Is there a way to specify such a column specification with readr (or any other package) functions in a more robust way than what I did ?

Using `readr` it looks like how you've done it is the approach to take. It could potentially be improved with `grepl` as opposed to `substr` so as to be more flexible. Also your code assumes the columns you are reading in are all `col_character` — Chris, Sep 04 '18 at 15:15
Indeed `grepl` provides more flexibility ! I assumed only string variables for the sake of simplicity, it is true that the `if else` syntax can be expanded to take into account more column types. — Romain, Sep 05 '18 at 07:30

Wimpel · Answer 1 · 2018-09-04T14:16:13.533

1

using data.table's fread you can select columns you want to skip (=drop) or keep (=select)

#read first line of file to select which columns to keep
#adjust the strsplit-character here ';' according to your csv-type
keep_col <- readLines( "myfile.csv", n = 1L ) %>% strsplit( ";" ) %>% el() %>% grep( "blop", . )
#read file, only the desired columns
fread( "myfile.csv", select = keep_col )

edited Sep 04 '18 at 14:16

answered Sep 04 '18 at 14:03

Wimpel

26,031
1
20
37

How to define column specification for similarly named column with readr?

1 Answers1