1

Is it possible to specify multiple column types with one assignment in cols() from read_csv?

Instead of:

read_csv2(my_file,
          col_types = cols(.default = 'i',
                           logi_one = 'l',
                           logi_two = 'l',
                           date_one = 'D',
                           date_two = 'D'))

I want to do something like

read_csv2(my_file,
          col_types = cols(.default = 'i',
                           c(logi_one, logi_two) = 'l',
                           c(date_one, date_two) = 'D'))
user1
  • 404
  • 1
  • 5
  • 18

3 Answers3

2

Here's a wrapper around readr::cols() that allows you to set types on multiple columns at once.

library(tidyverse)

my_cols <- function(..., .default = col_guess()) {
  dots <- enexprs(...)
  colargs <- flatten_chr(unname(
    imap(dots, ~ {
      colnames <- syms(.x)
      colnames <- colnames[colnames != sym("c")]
      coltypes <- rep_along(colnames, .y)
      purrr::set_names(coltypes, colnames)
    })
  ))
  cols(!!!colargs, .default = .default)
}

Example use:

set.seed(1)

# write sample .csv file
write_csv2(
  data.frame(
    int_one = sample(1:10, 10),
    logi_one = sample(c(TRUE, FALSE), 10, replace = TRUE),
    date_one = paste0("2022-01-", sample(10:31, 10)),
    int_two = sample(1:10, 10),
    logi_two = sample(c(TRUE, FALSE), 10, replace = TRUE),
    date_two = paste0("2022-02-", sample(10:28, 10))
  ),
  "my_file.csv"
)

read_csv2(
  "my_file.csv",
  col_types = my_cols(
    .default = 'i',
    l = c(logi_one, logi_two),
    D = c(date_one, date_two)
  )
)
#> # A tibble: 10 x 6
#>    int_one logi_one date_one   int_two logi_two date_two  
#>      <int> <lgl>    <date>       <int> <lgl>    <date>    
#>  1       9 TRUE     2022-01-18       1 FALSE    2022-02-15
#>  2       4 TRUE     2022-01-24       4 FALSE    2022-02-16
#>  3       7 TRUE     2022-01-14       3 FALSE    2022-02-19
#>  4       1 TRUE     2022-01-31       6 TRUE     2022-02-28
#>  5       2 TRUE     2022-01-23       2 TRUE     2022-02-17
#>  6       5 FALSE    2022-01-29       7 FALSE    2022-02-23
#>  7       3 FALSE    2022-01-26       5 TRUE     2022-02-11
#>  8      10 FALSE    2022-01-11       8 FALSE    2022-02-22
#>  9       6 FALSE    2022-01-19       9 FALSE    2022-02-25
#> 10       8 TRUE     2022-01-28      10 TRUE     2022-02-20

Created on 2022-03-05 by the reprex package (v2.0.1)

zephryl
  • 14,633
  • 3
  • 11
  • 30
  • Very nice! Not convoluted like mine, haha. You might consider posting something similar on this other older question, that doesn't have any answers: https://stackoverflow.com/q/31885990/15293191 – AndrewGB Mar 05 '22 at 21:34
  • 2
    @AndrewGillreath-Brown Thanks! I flagged that older one as a duplicate of this one... which feels a bit backwards [but apparently is the thing to do](https://meta.stackexchange.com/a/147651). – zephryl Mar 05 '22 at 21:43
  • Interesting! That's good to know. – AndrewGB Mar 05 '22 at 21:45
0

Here is one possibility (though a little complicated and verbose). If you have a list of the columns that you want to change, then we can create a single string for the col_types. From the help for ?read_csv, the col_types argument can take a single string of column shortcuts (e.g., iiDl). Here, I read in the column names, then bind that to the list of columns that need to be changed. Then, I replace any NA with the default type, i, then I collapse all column types into a single string. Then, I use that to define the col_types in read_csv.

library(tidyverse)

col_classes <-
  bind_rows(
    read_csv(my_file, col_types = cols(.default = "c"))[0, ],
    tibble(
      logi_one = 'i',
      logi_two = 'i',
      date_one = 'D',
      date_two = 'l'
    )
  ) %>%
  mutate(across(everything(), ~ replace_na(., "i"))) %>%
  as.character(.[1, ]) %>%
  paste0(., collapse = "")

results <- read_csv(my_file, col_types = col_classes)

However, this obviously would not work for read_csv2. But you could collapse every row back down, like this:

output <-
  data.frame(apply(read_csv(myfile), 1, function(x)
    paste(x, collapse = ",")))

names(output) <- paste(names(results), collapse = ",")
AndrewGB
  • 16,126
  • 5
  • 18
  • 49
0

This is my first answer to stack overflow question but I played around with the question because I had similar question a while back and I while the above solutions may be valid, I wanted to provide an alternative.

  1. Assign the columns names you want to a vector; eg
custom_col_logic<- c("logi_one","logi_two") 

custom_col_date<- c("date_one","date_two") 
  1. Then use the map() function on each to apply the col_logic() and col_date() in to separate arguments. Then assign the column names to each of the arguments.
#assign elments col_logic or col_date
type_logical <-map(custom_col_logic,~col_logic())

type_date <-map(custom_col_date,~col_date())

#now assign the column names to this
names(type_logtical)<-custom_col_logic

names(type_date) <-custom_col_date
  1. Here is the trick, you then need to use the as.col_spec() argument to turn these two vectors into col_spec class.
type_logical<- as.col_spec(type_logical)
type_date <- as.col_spec(type_date)
  1. Lastly assign a new variable to cols() and then add to that variable the above custom cols
#assign new varibale to class cols
custom_col_type <- cols()

#assign the variables from before to this new variable's cols argument

custom_col_type$cols <- c(type_logical,type_date)

Then you are done! now you can use that as a direct argument in the col_type argument in read_csv

Thanks!

If you found this helpful, please vote or mark it as the final answer

alejandro_hagan
  • 843
  • 2
  • 13