1

I am trying to build a file in S3 using write_delim and I wanted it enclosed with double quotes(") however I don't know if it is not a parameter in write_delim function and I will need to use a base R function or if I am doing it incorrectly. Here is what I tried

s3write_using(file_filtered,
              FUN = write_delim,
              delim = ",",
              na = "",
              object = paste0(output_path,
                              "file-",
                              lubridate::today(),
                              ".csv"),
              bucket = input_bucket)

s3write_using(file_filtered,
              FUN = write_delim,
              delim = ",",
              na = "",
              quote = "double",
              object = paste0(output_path,
                              "file-",
                              lubridate::today(),
                              ".csv"),
              bucket = input_bucket)



camille
  • 16,432
  • 18
  • 38
  • 60

1 Answers1

0

If I understand you correctly, you want to write a csv to your S3 bucket that includes one quote at the start and one quote at the end.

From the s3write_using documents:

FUN: For s3write_using, a function to which x and a file path will be passed (in that order).

So all you need is to define a function that will take an R object as its first argument and write a quote-enclosed csv string to the path passed as the second argument.

If you are really worried about optimization issues, readr::write_delim is certainly faster than write.csv, but the data.table library has an even faster function, fwrite, which allows quoting in the same way as write.csv

write_quoted_csv <- function(object, path)
{
  data.table::setDT(object)
  data.table::fwrite(object, path, quote = TRUE)
  data.table::setDF(object)
}

Let's test it against write_delim using a dataframe with 100,000 rows:

df <- data.frame(a = 1:50000, 
                 b = 50001:100000, 
                 c = rep(LETTERS[1:10], each = 5000))

microbenchmark::microbenchmark(
  readr      = readr::write_delim(df, "~/test_readr.csv", delim = ",", na = ""),
  data.table = write_quoted_csv(df, "~/test_datatable.csv"), 
  times      = 100)
# Unit: milliseconds
#        expr       min       lq      mean    median        uq       max neval
#       readr 244.87593 257.6236 276.91877 262.86998 283.07285 416.79254   100
#  data.table  20.80768  22.8940  26.25808  24.92915  27.69624  54.55789   100

You can see that the data.table method is over 10 times faster. Even then, write_delim doesn't put the quotes in, whereas fwrite does:

cat(readLines("~/test_readr.csv", 10), sep = "\n")
#> a,b,c
#> 1,50001,A
#> 2,50002,A
#> 3,50003,A
#> 4,50004,A
#> 5,50005,A
#> 6,50006,A
#> 7,50007,A
#> 8,50008,A
#> 9,50009,A
cat(readLines("~/test_datatable.csv", 10), sep = "\n")
#> "a","b","c"
#> 1,50001,"A"
#> 2,50002,"A"
#> 3,50003,"A"
#> 4,50004,"A"
#> 5,50005,"A"
#> 6,50006,"A"
#> 7,50007,"A"
#> 8,50008,"A"
#> 9,50009,"A"

So, with a super-fast method to pass, you can write your s3 file this way:

s3write_using(file_filtered,
              FUN = write_quoted_csv,
              object = paste0(output_path, "file-", lubridate::today(), ".csv"),
              bucket = input_bucket)
Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • The idea is to have strings enclosed as we can do with the function write.csv(quote = TRUE) but I am not able to do that with write_csv. I want to use write_csv due to optimization issues. – Jose S. Ameijeiras Jan 15 '20 at 10:08
  • 1
    @JoseS.Ameijeiras I have updated my answer to give you a very efficient solution – Allan Cameron Jan 18 '20 at 16:19