1

Is it possíble to load at once data from several files into a ff data frame (ffdf)? Lets say I have

big_file_part1.csv
big_file_part2.csv
big_file_part3.csv

I know I could load each csv file to a separate ffdf object and then ffdfrbind.fill them together. But that seems like an inefficient way, loading stuff twice. Is there more direct way?

LucasMation
  • 2,408
  • 2
  • 22
  • 45
  • How big are your files? – Miha Trošt Oct 17 '14 at 18:57
  • There are actually 27 files, one for each state. 10GB in total, but most states are small (<100MB) and 2 states are rather larger (>4GB>my RAM) – LucasMation Oct 17 '14 at 19:00
  • 2
    You can use the argument 'x' from read.csv.ffdf to append your data to an existing ffdf. If the different csv files have the same structure of course. –  Oct 20 '14 at 07:23

1 Answers1

2

This is how I did it (note that my source data does not have any headers).

First step - make sure all your files are in the same folder. Set your working directory to the folder.

#load the ffbase library
library(ffbase)

#create a vector of the files that I want to load
temp = list.files(pattern="*.csv")

#create the first ffdf object for i = 1, this is necessary to establish the ff dataframe to append the rest
for (i in 1)
  mydata <- read.csv.ffdf(file=temp[i], header=FALSE, VERBOSE=TRUE
          , first.rows=100000, next.rows=100000, colClasses=NA)

#loop through the remaining objects
for (i in 2:length(temp))
  mydata <- read.csv.ffdf(x = mydata, file=temp[i], header=FALSE, VERBOSE=TRUE
            , first.rows=100000, next.rows=100000)
Winnie Kuo
  • 21
  • 2