1

I have tried asking this question before, but was a bit sloppy and confusing in delivering my data. For context, see Importing fread vs read.table and errors.

I want to read in a selection of columns from a bunch of .csv files and bind these together. As these .csv files are very big, it is not possible to import the files completely.

I tried to do this using the following piece of code for columns 1,2,3,25 and 29:

my.files <- list.files(pattern=".csv")
my.data <- lapply(my.files,fread, header = FALSE, sep=",", select = c(1,2,3,25,29)) 
df <- do.call("rbind", my.data)

However, using fread made the resulting datafile consist of characters, making it impossible to create good graphs. I tried to convert character data to numeric (see the context link), but that didn't work out as well.

When using read.table instead of fread on one of the files, the data is read in correctly. Therefore I would want to create a piece of code that does the same as the piece of code with fread, but with read.table. I tried the following code, but that didn't work.

my.files <- list.files(pattern=".csv")
my.data <- lapply(my.files,read.table, header = FALSE, sep=",", colClasses = c(1,2,3,25,29)) 
df <- do.call("rbind", my.data)

How can I read in specific columns from the .csv files with read.table and binding these together, while avoiding having to read in the complete files?

Marijn
  • 61
  • 1
  • 7

2 Answers2

0

colClasses uses a vector defining the expected classes. Use "NULL" when you want to skip a column.

For example:

colClasses = c("character", NULL , "character")

In this case, columns 1 and 3 are set to character and column 2 is skipped.

Esteban PS
  • 929
  • 1
  • 8
  • 12
0

First determine the number of columns nc in the files by looking at the first row of the first file. Using nc set up the colClasses vector which should contain all "NULL" values except for the desired columns which should be NA. Then read the files in using the colClasses vector we computed and rbind the resulting data frames together.

nc <- ncol(read.csv(my.files[1], header = FALSE, nrows = 1))
colClasses <- replace(rep("NULL", nc), c(1:3, 25, 29), NA)
my.data <- lapply(my.files, read.csv, header = FALSE, colClasses = colClasses)
do.call("rbind", my.data)
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341