3

I have a dataset created in an R session, that I want to 1) export as csv 2) save the readr-type column specifications separately. This will allow me to read this data later on, using read_csv() and specifying col_types from the file saved in 2).

Problem: one gets column specifications (attribute spec) only for data read with a read_* function. It does not seem possible to obtain directly column specifications from dataset created within R?

My worflow so far is:

  1. Export item: write_csv()

  2. Read specification from the exported file: spec_csv().

  3. Save the column specification: write_rds()

  4. Then finally read_csv(step_1, col_types=step_3)

But this is error prone, as spec_csv() can get it wrong: it is indeed only guessing, so in case all values are NA, need to attribute arbitrary (character) class. Ideally I would like to be able to extract column specifications directly from the original dataset, instead of writing/re-loading. How can I do that? I.e., how can I convert my classes of a data-frame to a spec object?

Thanks!

Actual (inefficient) worfkow:

library(tidyverse)

write_csv(iris, "iris.csv")

spec_csv("iris.csv") %>%
  write_rds("col_specs_path.rda")  

read_csv("iris.csv", col_types = read_rds("col_specs_path.rda"))
Matifou
  • 7,968
  • 3
  • 47
  • 52
  • Can you A) say what situations allow `spec_csv` to "get it wrong", and B) post an example where this actually happens? – IRTFM Mar 23 '17 at 22:33
  • Sure, I added a discussion of this, although this is not really the main point of the post, having to run specs_cols on a file can also be slow. – Matifou Mar 23 '17 at 22:57

0 Answers0