0

Using R, I am reading a file with fread, is a file with many columns and rows. The file looks like this:

1_17118 1_18353 1_21882 1_21955 1_22054
Ind0001  -1      -1      -1      -1
Ind0002  -1      -1      -1      -1
Ind0003  -1      -1      -1      -1
Ind0005  -1      -1      -1      -1
Ind0006  -1      -1      -1      -1

I am reding the file like this:

M <- fread("file.txt")

And im getting the next error:

Error in fread("file.txt") :
  embedded nul in string: '\xff\xff\xff\001\0\0'
Execution halted

I read another file that looks very similar and I didn't have this problem before.

My columns, except for the header ant the first column have -1, 1 and 0. I guess that there should be a string inside the -1, 1 and 0 and this is causing the problem. Do you know how I could identify any string inside my file? I tried several possibilities with grep. But I am not sure how to look for any string. Do you know how I could solve this problem?

oguz ismail
  • 1
  • 16
  • 47
  • 69
Eric González
  • 465
  • 2
  • 10
  • What's the encoding of your `file.txt`? Try `fread(file, encoding = 'UTF-8')` and see [here](https://stackoverflow.com/questions/29939478/fread-data-table-in-r-with-specification-of-encoding) for reference – Bastian Schiffthaler Nov 06 '19 at 12:56
  • The coding of the file that works is: file -bi file1.txt text/plain; charset=us-ascii and the second file that doesn't work is file -bi file2.txt text/plain; charset=us-ascii – Eric González Nov 06 '19 at 13:02
  • So a `\0` is not expected in a `us-ascii` encoded file. Probably something was not exported correctly. You can try to remove them with `tr < file.txt -d '\000' > file_no_null.txt` – Bastian Schiffthaler Nov 06 '19 at 13:08
  • Thank you very much. Do you know how I could visualize (identify) the problematic lines?. – Eric González Nov 06 '19 at 13:19
  • Something like `grep -n -a -P '\x00' file.txt` (for GNU grep) – Bastian Schiffthaler Nov 06 '19 at 13:25
  • Thank you very much. I can not find anything with that grep. Also the expression tr < file.txt -d '\000' > file_no_null.txt didin't work. I am not sure what is happening. – Eric González Nov 06 '19 at 13:34
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/201948/discussion-between-bastian-schiffthaler-and-eric-gonzalez). – Bastian Schiffthaler Nov 06 '19 at 13:36

1 Answers1

0

In my case, the problem with fread was the size of my file (2.7G). Using R version 3.6.0, fread was unable to read the whole file. The solution was to split my file in two smaller files. Then I performed an rbind to merge the two files, after that everything worked normally.

Eric González
  • 465
  • 2
  • 10