14

Could not find proper answer in previous questions and answers to my problem: 1. I have a 2.3 GB csv file which contains 2.4 million rows of Hebrew Text, currently coded in ASCII. Since we are talking about big file, fread would be preferable but what about the encoding? Any idea how to read csv file coded in ASCII to avoid the famous "embedded nul in string" error?

Thank you

Dmitry Leykin
  • 485
  • 1
  • 7
  • 14
  • 2
    https://github.com/Rdatatable/data.table/issues/563 – David Arenburg Apr 29 '15 at 09:25
  • i've tried the solution, but all i get from R is > fread("C:/Users/WINDOWS 7/IdeaProjects/PHD/classifier/phdcorpus2_processed/phdcorpus2_processed.csv" , encoding='UTF8') Error in fread("C:/Users/WINDOWS 7/IdeaProjects/PHD/classifier/phdcorpus2_processed/phdcorpus2_processed.csv", : unused argument (encoding = "UTF8") – Dmitry Leykin May 02 '15 at 08:11
  • 2
    It is not a solution, it is FR on GitHub which means that your problem can't be currently solved using the current `data.table` version but the developers working on it. – David Arenburg May 02 '15 at 17:47

1 Answers1

19

As of August 25th the case linked by David Arenburg is closed, and the functionality is included in the currently available version of data.table. The encoding parameter can now be used when calling fread:

text <- fread(file, encoding = 'UTF-8')

ASCII is not an explicit encoding option, but ASCII is valid UTF-8, so you can specify UTF-8 when you want to read your Hebrew text.

Alex A.
  • 2,646
  • 22
  • 36
  • 3
    I am using data.table 1.9.7 (confirmed with `sessionInfo()`) and I get this error: `Error in fread("data.csv", encoding = "UTF-8") : unused argument (encoding = "UTF-8")` – Jeff Jul 20 '16 at 17:55