using readr to import big data file with different row lengths and whitespace as delimiter

Question

I have an issue to Read a big file (close to 2000000 rows) using the readr package.

Why do I want to use the readr package. My data file can containing ASCII control characters (0x01 equal to ascii 26 equal to CTRL+Z) that stops the execution of read.table() and I note that the readr package is not sensitive to that problem.

My file has different row lenghts thus I would used the fill=TRUE if I could use the read.table().

I tried to use read_table of the readr package but without success because it seems to not find whitespaces as columns separator.

I tried to use read_delim. with the code read_delim(file,delim=" "). The separator were found but the first row is considered as the main length of my data frame and thus lengther rows were truncated.

Does anyone got an advice ?

Could you show how you've tried using `read_file()` ? This functions was created precisely to [read text file where columns are separated by whitespace](http://search.r-project.org/library/readr/html/read_table.html). Thinking of an alternative, do you know the `width` of each column ? or column initial and final positions ? — rafa.pereira, May 12 '16 at 14:46
I've just tried with fread, but it seems to stopped at an empty line, do you have a clue to stop that error? — etienne leroy, May 12 '16 at 14:59
You could try to set `blank.lines.skip = TRUE`, see `?fread` for info on all the parameters. — Jaap, May 12 '16 at 15:04
I understand Something on read_table using. It seems to not work if " Each line is the same length, and each field is in the same position in every line. It's similar to read.table, but rather parsing like a file delimited by arbitrary amounts of whitespace, it first finds empty columns and then parses like a fixed width file. " indeed, I tried with 10 identical lines and it works, and if i had a shorter line at the beginning it don't — etienne leroy, May 12 '16 at 15:12
There is an error when I try to use blank.line.skip=TRUE in fread > essai2<-fread(file2,sep=" ",blank.lines.skip = TRUE) Error in fread(file2, sep = " ", blank.lines.skip = TRUE) : unused argument (blank.lines.skip = TRUE) — etienne leroy, May 12 '16 at 15:17
fread blank.lines.skip is only available on data.table 1.9.7 which is not already in CRAN [link] (http://stackoverflow.com/questions/34539408/data-table-fread-how-to-ignore-empty-line) and unfortunatly I don't suceed to install it. — etienne leroy, May 13 '16 at 06:56
could you try `read_delim(file,delim="")` ? Note I removed the space between quotation marks. — rafa.pereira, May 13 '16 at 08:03
read_delim(file,delim="") gave me the following result: a data frame of one column with one line of my data on each rows. — etienne leroy, May 13 '16 at 09:09

score 0 · Answer 1 · answered May 13 '16 at 14:37

I succeed in collect my data (from the file named file) into a dataframe (rtcm1) using the following code:

 #create a vector for named the columns, actually I used more for define the number of columns to be used to import my file

 col<-paste("V",1:17,sep="")

#use read_delim of the readr packages with a separator is whitespace. I don't really know why but I need to put quote="" to collect all my datas. maybe to not consider "" as quoting characters.

 rtcm1<-read_delim(file,delim=" ",col_names=col,quote="")

With such solution NA's fill cells with no datas and warnings are given by the function but it seems to works well.

using readr to import big data file with different row lengths and whitespace as delimiter

1 Answers1