Reading a large csv file using colbycol

Question

I have a csv file with three rows. The first row has 7 integervalue, the second one has 5 and the thisrd 3. I want to read this data using colbycol and then run fft on each of columns. but in the first step if I use this commadn:

cbc.read.table<-cbc.read.table("c:\\users\\Babak\\Desktop\\test1.csv",header=FALSE, sep=",")

I get this error:

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
  line 2 did not have 7 elements

my question is, is it possible to read a CSV file with colbycol

UPDATE My CSV File contains only:

14,25,83,64,987,45,78
15,45,32,14,4,8
14,89,14,87,37,456

I assume those three lines are just the header of the file (otherwise, it would not qualify as "large"): you can skip them by setting the `skip` argument to a non-zero value. — Vincent Zoonekynd, Sep 18 '13 at 09:43
@VincentZoonekynd I have myself created the csv file nad they are not header. I have Updated my post.You can see the content of my csv file — TangoStar, Sep 18 '13 at 09:50
Related question: http://stackoverflow.com/questions/5402758/importing-a-txt-file-when-number-of-columns-varies — Vincent Zoonekynd, Sep 18 '13 at 09:58
but fill=TRUE doesnt work with `cbc.read.table` I have tested it — TangoStar, Sep 18 '13 at 10:00

score 1 · Answer 1 · edited May 23 '17 at 12:20

1

Is your file really big enough that you need to use cbc.read.table? I mean, have you tried and benchmarked base functions and found them seriously wanting? Base read.table (below) will get the job done for even largish files.

If you want to process a really huge file, this question and its answers describe a number of strategies in addition to colbycol that are perhaps more tried-and-tested than that package (no disrespect to the colbycol author who is of course welcome to comment).

rawtext <- "14,25,83,64,987,45,78
15,45,32,14,4,8
14,89,14,87,37,456"

txt <- read.table(textConnection(rawtext),
                  header = FALSE,
                  sep = ",",
                  fill = TRUE)

Giving:

txt
  V1 V2 V3 V4  V5  V6 V7
1 14 25 83 64 987  45 78
2 15 45 32 14   4   8 NA
3 14 89 14 87  37 456 NA
>

edited May 23 '17 at 12:20

Community

1
1

answered Sep 18 '13 at 10:31

SlowLearner

7,907
11
49
80

It is almost 366 MB csv file. it contains 7500 columns and 10241 rows . I have used `apply` to run fft on each column, but the memory was not enough to do that, now I want to read each Column and run fft on it. hence I have decided to use `colbycol` What do you mean? – TangoStar Sep 18 '13 at 11:25
1

If you pre-specify numer of rows and the data types of each column with the `nrow` and `colClasses` arguments to `read.table`, R can pre-allocate the resulting `data.frame` and parse the file fast and efficiently. I have read 400 MB csv-files with 40 columns and 500,000 rows this way without problems. – Backlin Sep 18 '13 at 12:15
Ican successfuly implement that with `colClasses` and `nrows` with this command: `temptb=read.table("c:\\Path\\xxx.csv",header=FALSE,sep=",",colClasses="numeric",nrows=10000)[,1]` but it is very slow. as I said I have 10240 rows and 7500 columns to run the above command I need 2 minutes. Is it normal? how can I do it better? – TangoStar Sep 18 '13 at 14:40

Reading a large csv file using colbycol

1 Answers1