11

Since fwrite() cannot apply encoding argument , how can i export csv file in specific encode as fast as fwrite() ? (fwrite() is the fastest function within my acknowledgement so far)

fwrite(DT,"DT.csv",encoding = "UTF-8")
Error in fwrite(DT, "DT.csv", encoding = "UTF-8") : 
  unused argument (encoding = "UTF-8")
rane
  • 901
  • 4
  • 12
  • 24
  • 3
    As of 2019-March, this is an open issue on the package. See https://github.com/Rdatatable/data.table/issues/1770 – Ricardo Saporta Mar 19 '19 at 17:24
  • So , in window , only way to 100% ensure exported file be automatically as UTF-8 in excel is -- write.csv(df, "test.csv", fileEncoding = "UTF-8") -- slower processing time trade off fwrite() speed as fwrite() just guessing certain of rows of whole big dataset. Actually user2554330 solution doesn't solve this topics . – rane Mar 20 '19 at 16:11

4 Answers4

10

You should post a reproducible example, but I would guess you could do this by making sure the data in DT is in UTF-8 within R, then setting the encoding of each column to "unknown". R will then assume the data is encoded in the native encoding when you write it out.

For example,

DF <- data.frame(text = "á", stringsAsFactors = FALSE)
DF$text <- enc2utf8(DF$text) # Only necessary if Encoding(DF$text) isn't "UTF-8"
Encoding(DF$text) <- "unknown"
data.table::fwrite(DF, "DF.csv", bom = TRUE)

If the columns of DF are factors, you'll need to convert them to character vectors before this will work.

user2554330
  • 37,248
  • 4
  • 43
  • 90
7

As of writing this, fwrite does not support forcing encoding. There is a workaround that I use, but it's a bit more obtuse than I'd like. For your example:

readr::write_excel_csv(DT[,0],"DT.csv")
data.table::fwrite(DT,file = "DT.csv",append = T)

The first line will save only the headers of your data table to the CSV, defaulting to UTF-8 with the Byte order mark required to let Excel know that the file is encoded UTF-8. The fwrite statement then uses the append option to add additional lines to the original CSV. This retains the encoding from write_excel_csv, while maximizing the write speed.

SirTain
  • 369
  • 2
  • 6
  • Works, but the `readr::write_delim` command selects the wrong dimension of the data.table. It should rather be `readr::write_excel_csv(DT[0,],"DT.csv")` – hannes101 Mar 16 '22 at 13:57
2

If you work within R,
try this as working approach:

# You have DT   
# DT is a data.table / data.frame   
# DT$text contains any text data not encoded with 'utf-8'       

library(data.table)   
DT$text <– enc2utf8(DT$text) # it forces underlying data to be encoded with 'utf-8'   
fwrite(DT, "DT.csv", bom = T) # Then save the file using ' bom = TRUE ' 

Hope that helps.

s_baldur
  • 29,441
  • 4
  • 36
  • 69
Oleksandr
  • 33
  • 4
2

I know some people have already answered but I wanted to contribute a more holistic solution using the answer from user2554330.

# Encode data in UTF-8
for (col in colnames(DT)) {
    names(DT) <- enc2utf8(names(DT)) # Column names need to be encoded too
    DT[[col]] <- as.character(DT[[col]]) # Allows for enc2utf8() and Encoding()
    DT[[col]] <- enc2utf8(DT[[col]]) # same as users' answer
    Encoding(DT[[col]]) <- "unknown"
}

fwrite(DT, "DT.csv", bom = T)

# When re-importing your data be sure to use encoding = "UTF-8"
DT2 <- fread("DT.csv", encoding = "UTF-8") 
# DT2 should be identical to the original DT

This should work for any and all UTF-8 characters anywhere on a data.table

cach dies
  • 331
  • 1
  • 14