0

Is there a recommended way of "tidy type casting", i.e. of coercing the columns of a tibble to desired types/classes based on a target specification?

Since vctrs seems to come up with new tidy "nuts and bolds" for vectors, I'd probably prefer a solution that's based on vctrs. While I have something that works, I was wondering if there are even better ways of "tidy type casting" (if that's the correct conceptual term for that) than using a mix of:

  1. base R things like factor() and numeric()
  2. methods of vctrs::vec_cast()
  3. and handling the map part via purrr::map2_df()

This is what I could come up with so far:

library(magrittr)
#> Warning: package 'magrittr' was built under R version 3.5.2

# Data ----
df <- tibble::tribble(
  ~col_a, ~col_b,
  "a",   "1",
  "b",   "2",
  "c",   "3"
)

# Approach via readr::cols and readr::type_convert -----
col_types <- readr::cols(
  readr::col_factor(),
  readr::col_double()
)

df %>% 
  readr::type_convert(col_types = col_types)
#> # A tibble: 3 x 2
#>   col_a col_b
#>   <chr> <dbl>
#> 1 a         1
#> 2 b         2
#> 3 c         3

# Approach via vctrs::vec_cast -----
col_types <- list(
  factor(),
  numeric()
)

df %>%
  purrr::map2_df(col_types, function(.x, to) {
    vctrs::vec_cast(.x, to)
  }) 
#> # A tibble: 3 x 2
#>   col_a col_b
#>   <fct> <dbl>
#> 1 a         1
#> 2 b         2
#> 3 c         3

Created on 2019-01-11 by the reprex package (v0.2.1)

What surprised me is that the approach via readr::type_convert() seems to ignore the fact that col_a should become a factor.

Steffen Moritz
  • 7,277
  • 11
  • 36
  • 55
Rappster
  • 12,762
  • 7
  • 71
  • 120

1 Answers1

1

The cols() function expects named parameters. So

col_types <- readr::cols(
  col_a = readr::col_factor(),
  col_b = readr::col_double()
)

would work with

df %>% 
  readr::type_convert(col_types = col_types)
MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • Thanks for pointing that out. A bit strange though that it works for the double, but not for the factor, isn't it? – Rappster Jan 11 '19 at 17:17
  • 1
    Well, if you don't pass any parameters, it will convert it to double anyway (see `df %>% readr::type_convert()`) so it's basically just ignoring what you pass in. The default is `col_guess()` so it's guessing that since all your character values are numbers, that you wanted a numeric column. – MrFlick Jan 11 '19 at 17:20