1

I'm using readr::write_csv() to export a data.frame in the .csv format to another system. By doing that I noticed that write_csv() increases the number of some decimals, which is a problem for the other application.

You can reproduce this behavior with the following code:

options(digits = 15)
x = structure(list(rsds = c(1.40066760539752, 1.62027378202433, -1.44847242961156, 
                        1.69761995098528, -0.942939230819493, -0.068529066008811, 1.76822135039912, 
                        -1.20101547762003, -0.829135203700728, 1.26695938729229, -2.01720249708251, 
                        1.81301280008168), co2 = c(-0.773101574330299, -0.773101574330298, 
                                                   -0.773101574330298, -0.773101574330298, -0.773101574330298, -0.773101574330298, 
                                                   -0.773101574330298, -0.773101574330298, -0.773101574330298, -0.773101574330298, 
                                                   -0.773101574330297, -0.773101574330298)), row.names = c(NA, -12L
                                                   ), class = "data.frame")
y <- round(x,7)
y
library(readr)
write_csv(y, "y.csv")
read.csv2("y.csv")

the output:


### Rounded output

> y
         rsds        co2
1   1.4006676 -0.7731016
2   1.6202738 -0.7731016
3  -1.4484724 -0.7731016
4   1.6976200 -0.7731016
5  -0.9429392 -0.7731016
6  -0.0685291 -0.7731016
7   1.7682214 -0.7731016
8  -1.2010155 -0.7731016
9  -0.8291352 -0.7731016
10  1.2669594 -0.7731016
11 -2.0172025 -0.7731016
12  1.8130128 -0.7731016

### Numbers tempered by write_csv()

> read.csv2("y.csv")
                          rsds.co2
1    1.4006676,         -1.6202738
2   1.6202738,-0.77310160000000006
3  -1.4484724,-0.77310160000000006
4     1.69762,-0.77310160000000006
5  -0.9429392,          -1.7682214
6  -0.0685291,-0.77310160000000006
7   1.7682214,-0.77310160000000006
8  -1.2010155,          -0.9429392
9  -0.8291352,-0.77310160000000006
10  1.2669594,          -1.4006676
11  -2.0172025,         -1.2669594
12  1.8130128,-0.77310160000000006

As you can see, some numbers were kept round while others gained additional decimals after being saved in .csv (in this example just one number, but I saw the same thing happening with other numbers in my original dataset). Is that an inherent flaw of write_csv() or is there a fix for it inside the tidyverse?

Marcos
  • 103
  • 6
  • I think you messed up something in that last data chunk. I can't reproduce the values that aren't -0.773.... after reading the csv. – Dason Jul 16 '20 at 16:02
  • is there any reason, why you're using dplyr's write_csv() for writing but not read_csv() for the respective reading? – mabreitling Jul 16 '20 at 16:26
  • Hi Dason, you were not able to reproduce the values in the read.csv2("y.csv"), or the code didn't wort? – Marcos Jul 16 '20 at 18:22
  • Hello mabreitling. I'm using read.csv2 because it preserves the actual values saved in the csv file. If I load it with any other read function the csv is converted to a data.frame and the print command rounds the values, so I'm not able to show the problem. And as I need to read this .csv in GAMS afterwards all this extra digits become a problem. – Marcos Jul 16 '20 at 18:24
  • 1
    Fix the "other application." Why in the world are you using somethng which requires fixed-length records? – Carl Witthoft Jul 16 '20 at 19:14
  • Then don't use `print` but do a formatted write (sprintf, e.g.) if you want to see the full values – Carl Witthoft Jul 16 '20 at 19:15
  • Why aren't you using `utils::write.csv` ? Further, it looks like a simple case of converting the input strings from the CSV file to the closest binary-rep floats. What does the actual CSV file contain? – Carl Witthoft Jul 16 '20 at 19:16
  • This code is part of a modeling framework and we use tidyverse for all our packages, that is why I'm trying to find a solution within tidyverse, if possible. And GAMS (General Algebraic Modeling System) is a modeling system for optimization that reads this .csvs and, unfortunately, it has a limit on the number of digits it can import. Write.csv/write.table work. My question is why write_csv has this undesirable behavior, that can be limiting for people working with numeric simulations (at least annoying). – Marcos Jul 16 '20 at 20:10
  • My concern is not on how the numbers are printed back in R, I did that just to make the visualization easier. However, one can open the saved file with the note pad, and observe how the numbers have different lengths. – Marcos Jul 16 '20 at 20:13

1 Answers1

1

One of the problems is that you use write_csv() that uses comma as separator (,) and use read.csv2() which assumes your separator is semicolon (;) and read both columns as one. If you use proper separator, the problem disappears in R.

read.csv("y.csv")
rsds        co2
1   1.4006676 -0.7731016
2   1.6202738 -0.7731016
3  -1.4484724 -0.7731016
4   1.6976200 -0.7731016

Regarding the output CSV file, the problem persists though. I would suggest using base read/write functions for both output/input.

write.csv(y, "z.csv", row.names = FALSE)

This renders the CSV file correctly.

"rsds","co2"
1.4006676,-0.7731016
1.6202738,-0.7731016
-1.4484724,-0.7731016
knytt
  • 583
  • 5
  • 15