Questions tagged [ff]

An R package that provides memory-efficient storage of large data on disk and fast access functions

The ff package provides data structures that are stored on disk but behave (almost) as if they were in RAM by transparently mapping only a section (pagesize) in main memory.

More information:

165 questions
1
vote
1 answer

Reshape ffdf dataframe in R

I am using dcast function to rshape datframe in R, but while using large dataframe. I converted that into ffdf dataframe unable to use dcast function please help me if any alternatives. Find the below example i used for small dataframe and what i…
Naga Pavan
  • 11
  • 3
1
vote
1 answer

Unable to access ffdf file from .RData

I loaded big data files (https://www.kaggle.com/c/avazu-ctr-prediction) using the read.csv.ff command from ff package using the following command: train = read.csv.ffdf(file="path to my big data files/train.csv",VERBOSE=TRUE) then save it using…
1
vote
1 answer

R with FF crashes when loading a large dataset

Good evening, I am attempting to load a dataset into R (~20 mil rows, 140 cols ~6.2gb on disk) using either LaF and ffbase or ff. In either case the load fails. struct <- detect_dm_csv(file = '/scratch/proj.csv', header = TRUE) colClasses <-…
1
vote
1 answer

Read data from multiple CSV files into single ffdf object

Is it possíble to load at once data from several files into a ff data frame (ffdf)? Lets say I have big_file_part1.csv big_file_part2.csv big_file_part3.csv I know I could load each csv file to a separate ffdf object and then ffdfrbind.fill them…
LucasMation
  • 2,408
  • 2
  • 22
  • 45
1
vote
1 answer

R ff package creating a new column gives an error "non-numeric argument to binary operator"

a <- data.frame(x=c(1,2,3), y=c(10,10,20)) a x y 1 1 10 2 2 10 3 3 20 a$z = a$x / a$y # works with data frame a x y z 1 1 10 0.10 2 2 10 0.20 3 3 20 0.15 a <- data.frame(x=c(1,2,3), y=c(10,10,20)) a_ff <- as.ffdf(a) a_ff$z = a_ff$x /…
Timothée HENRY
  • 14,294
  • 21
  • 96
  • 136
1
vote
0 answers

Lookup ff vectors using ffvecapply

I am trying to substitute values of an ff vector using two other vectors. With RAM objects it is straightforward: w <- 3:6; w1 = 1:10; w2 = letters[1:10] # a way to do it: sapply(w, FUN=function(x){ w2[which(w1 == x)] } ) [1] "c" "d" "e" "f" ff…
Audrey
  • 212
  • 4
  • 15
1
vote
1 answer

correlation matrix using large data sets in R when ff matrix memory allocation is not enough

I have a simple analysis to be done. I just need to calculate the correlation of the columns (or rows ,if transposed). Simple enough? I am unable to get the results for the whole week and I have looked through most of the solutions here. My laptop…
1
vote
1 answer

How to specify colClasses when reading a very big csv file into R using read.table.ffdf?

I am trying to read a very big .csv file, of size around 20G, using the function read.table.ffdf() in the "ff" package, but had trouble in specifying the colClasses option in read.csv(). I have to specify the colClasses option because some columns…
user3574507
  • 11
  • 1
  • 2
1
vote
1 answer

Merging ffdf dataframes in R

I need an outer join of ffdf dataframes saved in a list. Have checked this, but it refers to a different problem. Example code for RAM objects: x1 = data.frame(name='a1', Ai=2, Ac=1, Bi=1) x2 = data.frame(name='a2', Ai=1, Bi=3, Bc=1, Ci=1) x3 =…
Audrey
  • 212
  • 4
  • 15
1
vote
2 answers

Character vectors as ff objects in R

I am trying to convert a standard (RAM) character vector to an ff object (vector). The code below returns an error: > as.ff(c('a', 'b')) Error in ff(initdata = initdata, length = length, levels = levels, ordered = ordered,: vmode 'character' not…
Audrey
  • 212
  • 4
  • 15
1
vote
2 answers

Combine factor levels in an ff object

I often categorise times into day/night time using cut(). Because cut() doesn't understand that clock times go around zero, I first divide the hours into three groups (night either side of day), and then merge the two "night" factor levels. This…
nacnudus
  • 6,328
  • 5
  • 33
  • 47
1
vote
1 answer

Doing calculations on dataframe from ffdf object

Im working with a large dataset (3.5M lines and 40 columns) and I need to clean out some values so I´ll be able to calculate other parameters that I are necessary when I start formulating a model around the data. The problem is that it is taking…
1
vote
1 answer

Columnbind ff data frames in R

i try to work with the ff package. In this context i try to cbind two ff dataframes. I found a solution to combine a ffdf with a ff vector but how do i combine to ffdf. Here my code for combining ffdf with ff vector: library(ff) ## read Bankfull…
yemmit
  • 13
  • 2
0
votes
0 answers

Can i use biglm and/or ff packages for random effects estimation in R?

I have a very large panel in R, tryed to perform a plm regression and received the error "cannot allocate vector of size 11 Gb". I found out that regression in chunks could be a solution and tryed to use biglm and/or ff packages. My question is: can…
0
votes
1 answer

Merging and appending a list of ffdf dataframes

I would like to read a vector of CSV files names as ffdf data frames and combine them into one big ffdf data frame. I have found solutions using other r packages; however, my issue is my data (combined) can reach 40GB which definitely needs to be…
ahmathelte
  • 559
  • 3
  • 15