I have a data base with 250 columns and want to read only 50 of them instead of loading all of them then dropping columns with dplyr::select
. I suppose I can do that using a column specification. I don't want to type the column specification manually for all those columns.
The 50 columns I want to keep have a common prefix, say 'blop', so I managed to manually change the column specification object I got from readr::spec_csv
. I then used it to read my data file :
short_colspec <- readr::spec_csv('myfile.csv')
short_colspec$cols <- lapply(names(short_colspec$cols), function(name){
if (substr(name, 1, 4) == 'blop'){
return(col_character())
} else {
return(col_skip())
}
})
short_data <- read_csv('myfile.csv', col_types = short_colspec)
Is there a way to specify such a column specification with readr
(or any other package) functions in a more robust way than what I did ?