Using read.table for specific columns of .csv files

Question

I have tried asking this question before, but was a bit sloppy and confusing in delivering my data. For context, see Importing fread vs read.table and errors.

I want to read in a selection of columns from a bunch of .csv files and bind these together. As these .csv files are very big, it is not possible to import the files completely.

I tried to do this using the following piece of code for columns 1,2,3,25 and 29:

my.files <- list.files(pattern=".csv")
my.data <- lapply(my.files,fread, header = FALSE, sep=",", select = c(1,2,3,25,29)) 
df <- do.call("rbind", my.data)

However, using fread made the resulting datafile consist of characters, making it impossible to create good graphs. I tried to convert character data to numeric (see the context link), but that didn't work out as well.

When using read.table instead of fread on one of the files, the data is read in correctly. Therefore I would want to create a piece of code that does the same as the piece of code with fread, but with read.table. I tried the following code, but that didn't work.

my.files <- list.files(pattern=".csv")
my.data <- lapply(my.files,read.table, header = FALSE, sep=",", colClasses = c(1,2,3,25,29)) 
df <- do.call("rbind", my.data)

How can I read in specific columns from the .csv files with read.table and binding these together, while avoiding having to read in the complete files?

score 0 · Answer 1 · answered Feb 03 '18 at 17:50

0

colClasses uses a vector defining the expected classes. Use "NULL" when you want to skip a column.

For example:

colClasses = c("character", NULL , "character")

In this case, columns 1 and 3 are set to character and column 2 is skipped.

answered Feb 03 '18 at 17:50

Esteban PS

929
1
8
12

NULL should be a quoted string, i.e. `colClasses = c("character", "NULL" , "character")` – Stuart R. Jefferys Jun 16 '22 at 18:01

G. Grothendieck · Accepted Answer · 2018-02-03T18:18:24.820

0

First determine the number of columns nc in the files by looking at the first row of the first file. Using nc set up the colClasses vector which should contain all "NULL" values except for the desired columns which should be NA. Then read the files in using the colClasses vector we computed and rbind the resulting data frames together.

nc <- ncol(read.csv(my.files[1], header = FALSE, nrows = 1))
colClasses <- replace(rep("NULL", nc), c(1:3, 25, 29), NA)
my.data <- lapply(my.files, read.csv, header = FALSE, colClasses = colClasses)
do.call("rbind", my.data)

edited Feb 03 '18 at 18:18

answered Feb 03 '18 at 18:09

G. Grothendieck

254,981
17
203
341

Thanks a lot! Took a while, but in the end it worked. – Marijn Feb 03 '18 at 19:55

Using read.table for specific columns of .csv files

2 Answers2