3

I have some .csv files which I would like to open, specifying the default column type as "i" for integer. However, certain files also have specific column which I would like to tell readr::read_csv to open with defined types (the logic of which columns doesn't matter, let's assume I know which ones for which files)

Is there a way to pass these columns into the col_types argument of read_csv while still maintaining that every other column should be opened with integer type

df <- data.frame(
  a = c(1,2,3,4),
  b = sample(1:100, 4),
  c_text = c("hi", "I", "am", "text"),
  d_decimals = runif(4),
  e_more_text = c("another", "text", "column", "lol")
)

readr::write_csv(df, "/path/to/csv/file.csv")

character_cols <- c("c_text", "e_more_text")
double_cols <- "d_decimals"

data <- readr::read_csv(
  "/path/to/csv/file.csv",
  # supply something here to determine column types
  col_types = cols(.default = "i", character_cols = "c", double_cols = "d")
)

because of the logic in calculating which columns should be characters or doubles, etc. I'd ideally supply them as a vector of names

Cheers

Robert Hickman
  • 869
  • 1
  • 6
  • 22
  • Related: [Override column types when importing data using readr::read_csv() when there are many columns](https://stackoverflow.com/questions/31568409/override-column-types-when-importing-data-using-readrread-csv-when-there-are/) – Ian Campbell Jul 13 '21 at 16:33
  • I'd need to pass the names of the not-integer columns as strings (e.g. "c_text", not as c_text) for my example though which that question does not answer (and have tried unquoting but it didn't seem to work though I believe the preferred method of unquoting has changed recently in the tidyverse) – Robert Hickman Jul 13 '21 at 16:39
  • 2
    Related question showing the `do.call()` approach with `cols()`: https://stackoverflow.com/questions/53346557/readr-passing-a-string-of-column-classes. To make this a little more "programmatic" you could build a named vector based on your character/double vectors and then add `.default = "i"` in before `do.call()`. – aosmith Jul 13 '21 at 16:54

1 Answers1

4

You can make a helper function with combines your extra spec with the default column spec, then pulling the spec together with do.call.

extra_spec = list(
  "c_text" = "c",
  "d_decimals" = "i",
  "e_more_text" = "c"
)

read_csv_with_default_int = function(path, extra_spec) {
  readr::read_csv(path, col_types = do.call(cols, c(extra_spec, list(.default = col_integer()))))
}

read_csv_with_default_int("file.csv", extra_spec = extra_spec)

You could also avoid the lots of nested logic with a helper like

cols_default_int = purrr::partial(cols, .default = col_integer())

read_csv_with_default_int = function(path, col_types) {
  readr::read_csv(path, col_types = do.call(cols_default_int, col_types))
}

read_csv_with_default_int("file.csv", col_types = extra_spec)
Akhil Nair
  • 3,144
  • 1
  • 17
  • 32
  • yep that works- a bit surprised there's that much for what I thought would be implemented in the base function but suffices for my needs! – Robert Hickman Jul 13 '21 at 16:52