0

I have an issue to Read a big file (close to 2000000 rows) using the readr package.

Why do I want to use the readr package. My data file can containing ASCII control characters (0x01 equal to ascii 26 equal to CTRL+Z) that stops the execution of read.table() and I note that the readr package is not sensitive to that problem.

My file has different row lenghts thus I would used the fill=TRUE if I could use the read.table().

I tried to use read_table of the readr package but without success because it seems to not find whitespaces as columns separator.

I tried to use read_delim. with the code read_delim(file,delim=" "). The separator were found but the first row is considered as the main length of my data frame and thus lengther rows were truncated.

Does anyone got an advice ?

Arun
  • 116,683
  • 26
  • 284
  • 387
  • Did you try the `fread` function the `data.table` package? – Jaap May 12 '16 at 14:35
  • 2
    Could you show how you've tried using `read_file()` ? This functions was created precisely to [read text file where columns are separated by whitespace](http://search.r-project.org/library/readr/html/read_table.html). Thinking of an alternative, do you know the `width` of each column ? or column initial and final positions ? – rafa.pereira May 12 '16 at 14:46
  • I've just tried with fread, but it seems to stopped at an empty line, do you have a clue to stop that error? – etienne leroy May 12 '16 at 14:59
  • You could try to set `blank.lines.skip = TRUE`, see `?fread` for info on all the parameters. – Jaap May 12 '16 at 15:04
  • I understand Something on read_table using. It seems to not work if " Each line is the same length, and each field is in the same position in every line. It's similar to read.table, but rather parsing like a file delimited by arbitrary amounts of whitespace, it first finds empty columns and then parses like a fixed width file. " indeed, I tried with 10 identical lines and it works, and if i had a shorter line at the beginning it don't – etienne leroy May 12 '16 at 15:12
  • There is an error when I try to use blank.line.skip=TRUE in fread > essai2<-fread(file2,sep=" ",blank.lines.skip = TRUE) Error in fread(file2, sep = " ", blank.lines.skip = TRUE) : unused argument (blank.lines.skip = TRUE) – etienne leroy May 12 '16 at 15:17
  • fread blank.lines.skip is only available on data.table 1.9.7 which is not already in CRAN [link] (http://stackoverflow.com/questions/34539408/data-table-fread-how-to-ignore-empty-line) and unfortunatly I don't suceed to install it. – etienne leroy May 13 '16 at 06:56
  • could you try `read_delim(file,delim="")` ? Note I removed the space between quotation marks. – rafa.pereira May 13 '16 at 08:03
  • read_delim(file,delim="") gave me the following result: a data frame of one column with one line of my data on each rows. – etienne leroy May 13 '16 at 09:09

1 Answers1

0

I succeed in collect my data (from the file named file) into a dataframe (rtcm1) using the following code:

 #create a vector for named the columns, actually I used more for define the number of columns to be used to import my file

 col<-paste("V",1:17,sep="")

#use read_delim of the readr packages with a separator is whitespace. I don't really know why but I need to put quote="" to collect all my datas. maybe to not consider "" as quoting characters.

 rtcm1<-read_delim(file,delim=" ",col_names=col,quote="")

With such solution NA's fill cells with no datas and warnings are given by the function but it seems to works well.