6

I have run into an issue where even when I disable exponential notation, fwrite prints the number in exponential notation. An example:

library(data.table)
options(scipen = 999)
testint = c(500000)

Before I print, r behaves and does not print in exponential notation:

print(testint)
[1] 500000
print(list(testint)
[[1]]
[1] 500000

But when I do:

fwrite(list(testint), "output")

The content of the file is 5e+05. I suspect this issue may specifically be with fwrite, as when I do:

write(testint, "output1")

The content of the output file is 500000.

Is there any way to prevent fwrite from doing this? I could switch to using write, but there is a massive speed difference between them and I am writing a lot of data, so there would be a significant performance impact that I would like to avoid if possible. Thanks!

Edit: if anyone is interested, there is an existing open github issue here that I found after I asked the question!

Walker in the City
  • 527
  • 1
  • 9
  • 22
  • 1
    Could you not convert numbers manually to character using `sprintf`, `formatC` and friends? –  May 02 '18 at 20:42
  • @dash2 Yep, I could, but I am looking to see if there is a solution that does not require that I format every vector I print. And also possibly an explanation for why `fwrite` displays this behavior when other functions to write to file do not. – Walker in the City May 02 '18 at 20:48
  • 1
    Mmm. The `fwrite` doc talks about "exactly matching `write.csv`" so that may be the reason they do this. But `write.csv` doesn't seem to have the same problem. I'd file an issue. –  May 02 '18 at 20:52
  • 1
    `fwrite` outputs using `C` code. I think you need to disable scientific notation in the C environment but I'm not sure how that's done through R. – CPak May 02 '18 at 21:16
  • @dash2 I think I will end up filing an issue, thanks. – Walker in the City May 02 '18 at 21:40

2 Answers2

4

If you look at the source code of fwrite() function it passes the values your values straight to internal C function:

> fwrite
function (x, file = "", append = FALSE, quote = "auto", sep = ",",
    sep2 = c("", "|", ""), eol = if (.Platform$OS.type == "windows") "\r\n" else "\n",
    na = "", dec = ".", row.names = FALSE, col.names = TRUE,
    qmethod = c("double", "escape"), logicalAsInt = FALSE, dateTimeAs = c("ISO",
        "squash", "epoch", "write.csv"), buffMB = 8, nThread = getDTthreads(),
    showProgress = getOption("datatable.showProgress"), verbose = getOption("datatable.verbose"))
{
...
    .Call(Cwritefile, x, file, sep, sep2, eol, na, dec, quote,
        qmethod == "escape", append, row.names, col.names, logicalAsInt,
        dateTimeAs, buffMB, nThread, showProgress, verbose)
    invisible()
}

If you look at the source code of the function that is called: https://github.com/Rdatatable/data.table/blob/master/src/fwrite.c you will notice that they do not check for any environment set in R and use significant notation for large enough values. One can change this source the way you like, build own dynamic library and call it from R. The other option would be to use some standard R writing functions (though I suspect you like the performance of data.table package functions).

Katia
  • 3,784
  • 1
  • 14
  • 27
  • Thank you for taking the time to look through this so thoroughly! It seems I am out of luck unless I drag my limited C knowledge out of storage :). – Walker in the City May 02 '18 at 21:48
1

Would this be an acceptable workaround? (It would end up truncating to whatever decimal level of precision is set by the digit after the period.)

fwrite(list(sprintf("%9.2f", testint)))
500000.00

The response to the issue yage you cited had a suggestion to use bit64::as.integer64 from a package, but ordinary as.integer seems to work here:

fwrite(list(as.integer(testint)))
500000
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Using format function one can get a desirable output but this will affect the performance ( and performance is the reason why this package is used instead of regular base R functions). – Katia May 02 '18 at 22:44
  • I am actually currently using `format` as a similar solution, but unfortunately as @Katia said I am looking to maximize performance. – Walker in the City May 02 '18 at 22:59