I'm seeing unexpected behavior from the excellent readr::read_csv(). When trying to read a tibble containing a character vector of strings that all begin with "Inf" (e.g. "Inform", "Information"), read_csv() incorrectly reads it as a numeric Inf, instead of correctly reading it as a string. The base read.csv() correctly reads it as a string though. If the character vector contains at least one string that does not begin with "Inf" however (e.g. "Indigo"), then read_csv() will correctly read the vector as a string. Read_csv() will also correctly read the vector as a string if the col_types argument specifies it as a character vector, but that requires manual checks/edits.
Do others have this issue, and if so is there an argument for read_csv() or other workaround that will allow read_csv() to reliably read character vectors that happen to contain only strings beginning with "Inf"? It seems problematic to have to continually check all character vectors first and then manually specify col_types if all the strings happen to begin with "Inf".
Thanks very much, and apologies if I'm just missing something.
suppressPackageStartupMessages(library(tidyverse))
#################################################
# save tibble with character vector containing only strings that begin with "Inf"
test_1 <- tibble(x = c("Inform", "Information"))
test_1 %>% glimpse()
#> Rows: 2
#> Columns: 1
#> $ x <chr> "Inform", "Information"
test_1 %>% write_csv(file = "test_1.csv")
# read_csv() seems to convert the strings into numeric Inf because they all begin with "Inf"
# however, if col_types is manually specified as col_character, then read_csv() correctly reads the vector as a string
read_csv(file = "test_1.csv")
#> Rows: 2 Columns: 1
#> -- Column specification --------------------------------------------------------
#> Delimiter: ","
#> dbl (1): x
#>
#> i Use `spec()` to retrieve the full column specification for this data.
#> i Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 2 x 1
#> x
#> <dbl>
#> 1 Inf
#> 2 Inf
read_csv(file = "test_1.csv", lazy = FALSE)
#> Rows: 2 Columns: 1
#> -- Column specification --------------------------------------------------------
#> Delimiter: ","
#> dbl (1): x
#>
#> i Use `spec()` to retrieve the full column specification for this data.
#> i Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 2 x 1
#> x
#> <dbl>
#> 1 Inf
#> 2 Inf
read_csv(file = "test_1.csv", col_types = cols(x = col_character()))
#> # A tibble: 2 x 1
#> x
#> <chr>
#> 1 Inform
#> 2 Information
# read.csv() correctly reads the vector as a string
read.csv(file = "test_1.csv") %>% glimpse()
#> Rows: 2
#> Columns: 1
#> $ x <chr> "Inform", "Information"
# read_csv() correctly reads similar character vectors if they contain at least one string that does not begin with "Inf"
test_2 <- tibble(x = c("Inform", "Indigo", "Information")) %>% write_csv(file = "test_2.csv")
read_csv(file = "test_2.csv") %>% glimpse()
#> Rows: 3 Columns: 1
#> -- Column specification --------------------------------------------------------
#> Delimiter: ","
#> chr (1): x
#>
#> i Use `spec()` to retrieve the full column specification for this data.
#> i Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Rows: 3
#> Columns: 1
#> $ x <chr> "Inform", "Indigo", "Information"
#################################################
# get version info
packageVersion("tidyverse")
#> [1] '1.3.1'
version
#> _
#> platform x86_64-w64-mingw32
#> arch x86_64
#> os mingw32
#> system x86_64, mingw32
#> status
#> major 4
#> minor 1.1
#> year 2021
#> month 08
#> day 10
#> svn rev 80725
#> language R
#> version.string R version 4.1.1 (2021-08-10)
#> nickname Kick Things
Created on 2021-10-22 by the reprex package (v2.0.1)