0

I am looking for a fast serialization function to convert a data.frame to a delimited string in R. At the moment I am using readr::format_tsv (Versions readr_2.0.0 vroom_1.5.3) for the conversion and I am wondering if there is a faster implementation available. For the example below the conversion takes around 4.4 seconds which is too slow for my purpose.

system.time()

   user  system elapsed 
  3.878   0.495   4.372 

Example

df= data.frame(replicate(400, runif(35000, min=0, max=100)))

system.time({
  tsv = readr::format_tsv(df)
})
Matthias Munz
  • 3,583
  • 4
  • 30
  • 47
  • Can you explain why you are doing this? I'm assuming you pass this string to some other software. Do you really need the string in R or could other options be explored? – Roland Oct 01 '21 at 08:44
  • I have developed a REST API and need to serialize the data somehow to send it via HTTP. – Matthias Munz Oct 01 '21 at 09:57
  • I don't know REST but I wonder if you can't pass data in a better way than as a character string. Anyway, I would investigate if you can use a serializer that writes to an HTTP connection directly. Another option might be to write the data to a temporary file with `data.table::fwrite` and somehow pass that file to the API. If your data is purely numeric like in your example I would try to go a different route, i.e., use a matrix as the data structure and pass the numeric vector (serialized if necessary) and the dimensions. – Roland Oct 01 '21 at 10:21
  • Thanks a lot Roland for your thoughts. The API I implemented with R/plumber in case you are interested. I will check out you suggested solutions. Regarding your suggestion with the dimension, I am not sure if it will help to speed up. The amount of data remains the more or less same, I think, except that don't need any new line symbol anymore. – Matthias Munz Oct 01 '21 at 14:20
  • This is related to my question: https://stackoverflow.com/questions/48233309/fast-concatenation-of-data-table-columns-into-one-string-column Implementing a new function in C/C++ would be the fastest way I guess. The 'Rcpp' package provides R functions as well as C++ classes which offer a seamless integration of R and C++ – Matthias Munz Oct 01 '21 at 14:25

1 Answers1

0

The performance issue got resolved: https://github.com/r-lib/vroom/issues/377

Matthias Munz
  • 3,583
  • 4
  • 30
  • 47