1

I'm trying to read a csv file that looks like this (Let's call this test1.csv)

test_1;test_2;test_3;test_4
Test with Ö Ä;20;10,45;15,34

As you can see, the values are separated by ; and not , - in fact , is the decimal separator. I've added "Ö" and "Ä" because my data has German letters in it - requiring me to use ISO-8859-1 in the locale() in read_delim(). Note, this isn't as important, it just explains why I want to use read_delim().

Now I would read all this using read_delim():

read_delim("test1.csv", delim = ";", locale = locale(encoding = 'ISO-8859-1', 
           decimal_mark = ","))

Giving me this:

# A tibble: 1 x 4
  test_1              test_2 test_3 test_4
  <chr>               <dbl>  <dbl>  <dbl>
1 "Test with Ö Ä"     20   10.4   15.3

And indeed, I can get the 10.45 value out by using pull(test_3): [1] 10.45

But now if I simply add five 0s to the 10.45 making it 1000000.45 like so (let's call this test2.csv)

test_1;test_2;test_3;test_4
Test with Ö Ä;20;1000000,45;15,34

And then repeat everything, I completely lose the .45 behind the 1000000.

read_delim("test2.csv", delim = ";",locale = locale(encoding = 'ISO-8859-1',decimal_mark = ",")) %>% pull(test_3)
Rows: 1 Columns: 4                                                                                                    
 0s── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ";"
chr (1): test_1
dbl (3): test_2, test_3, test_4

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
[1] 1000000

I must be able to retain this information, no? Or control this behaviour? Is this a bug?

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
Moritz Schwarz
  • 2,019
  • 2
  • 15
  • 33

1 Answers1

2

This is a printing issue.

If you add %>% print(digits = 22) to the end of your workflow you get:

[1] 1000000.449999999953434
  • this is not 1000000.45 because what's shown is the closest approximation available in the standard floating-point system;
  • the default getOption("digits") value is 7; you can set this however you like with options(digits = <your_choice>). In this case anything between digits = 10 and digits = 17 will get you a printed result of "1000000.45"; digits = 18 starts to reveal the underlying approximation.
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • I would be shocked if there wasn't a duplicate of this somewhere, but I can't find it (this is **not** the same as "why aren't these numbers equal" ?) – Ben Bolker Apr 26 '22 at 22:55
  • Ahh such a rookie mistake... Right, I'm glad then, many thanks for your answer! I feel this is then probably not a fantastic question to remain available - in the end it has nothing to do with `readr`... – Moritz Schwarz Apr 29 '22 at 09:41