Questions tagged [ff]

An R package that provides memory-efficient storage of large data on disk and fast access functions

The ff package provides data structures that are stored on disk but behave (almost) as if they were in RAM by transparently mapping only a section (pagesize) in main memory.

More information:

165 questions
2
votes
1 answer

Convert an ff POSIXct vector to an ff numeric vector

I am trying to convert an ff vector with POSIXct entries to an ff numeric vector, containing the respective number of seconds since the origin 01-01-1970. x = as.ff(as.POSIXct(c("2014-06-30 00:01:27 BST", "2014-06-30 00:02:17 BST"))) The 'natural'…
Audrey
  • 212
  • 4
  • 15
2
votes
1 answer

Using ffsave and ffload from package ff

I have problem with *.ff files when I ffsave use ffload in R. When I use ffsave(fileName) I see the fileName.ffData and fileName.RData. My question are: Are the *.ff created somewhere when I use ffsave or after I use ffload(fileName)? Can I…
user1938809
  • 1,135
  • 1
  • 9
  • 12
2
votes
2 answers

computing multiple fixed effects on large dataset

I'm trying to perform a fixed effects regression for two factor variables in a CSV dataset containing over 4000000 rows. These variables can respectively assume about 140000 and 50000 different integer values. I initially attempted to perform the…
lebedov
  • 1,371
  • 2
  • 12
  • 27
2
votes
2 answers

Functions for creating and reshaping big data in R using the FF package

I'm new to R and the FF package, and am trying to better understand how FF allows users to work with large datasets (>4Gb). I have spent a considerable amount of time trawling the web for tutorials, but the ones I could find generally go over my…
Luke23
  • 33
  • 5
2
votes
2 answers

How to column bind two ffdf

Suppose two ffdf files: library(ff) ff1 <- as.ffdf(data.frame(matrix(rnorm(10*10),ncol=10))) ff2 <- ff1 colnames(ff2) <- 1:10 How can I column bind these without loading them into memory? cbind doesn't work. There is the same question…
user2763361
  • 3,789
  • 11
  • 45
  • 81
2
votes
1 answer

How to subset a large data frame (ffdf) in R by date?

I am trying to subset an FFDF by a date. Below, I have successfully created such a subset using a normal data frame. But I needed some help in applying this to an FFDF. My attempt, along with the error message, is listed in the code comment. Many…
Tyler Durden
  • 303
  • 5
  • 12
2
votes
1 answer

R - ff package - arithmetic operations on matrices

Is there a way to do simple arithmetic operations on ff class matrices? i.e. something like this: > library(ff) > a = ff(1, vmode = "double", dim = c(3,4)) > b = ff(2, vmode = "double", dim = c(3,4)) > a+b Error in a + b : non-numeric argument to…
2
votes
3 answers

ff package write error

I'm trying to work with a 1909x139352 dataset using R. Since my computer only has 2GB of RAM, the dataset turns out to be too big (500MB) for the conventional methods. So I decided to use the ff package. However, I've been having some troubles. The…
2
votes
1 answer

Replace NAs in a ffdf object

I`m working with a ffdf object which has NAs in some of the columns. The NAs are the result of a left outer merge using merge.ffdf.I would like to replace the NAs with 0s but not managing to do it. Here is the code I am running: library(ffbase) …
ddg
  • 2,493
  • 2
  • 20
  • 23
1
vote
0 answers

could not find function "as.data.frame.ffdf"

I'm following the guide Big Data Analytics with R. But the as.data.frame.ffdf function seems to be missing. Does anyone have an idea? Or is there any alternative solution? Here's sample code: # Data were downloaded from Bureau of Transportation…
1
vote
0 answers

How to deal with error about memory limitation with biglm function to ffdf object

I have a large ffdf object in R. It contains x and y values, and each column has 71,998,512 values. I am trying to apply biglm function in biglm package as below dat <- ffdf(Back = Pitch_Back$V1, Head = Pitch_Head$V1) fit_linear <-…
imtaiky
  • 191
  • 1
  • 12
1
vote
1 answer

Querying out of memory 60gb tsv's in R on the first column, which database/method?

I have 6 large tsv's matrices of 60gb (uncompressed) containing 20million rows x 501 columns: the first index/integer column that is basically the row number (so not even necessary), 500 columns are numerical (float, 4 decimals e.g. 1.0301). All…
tafelplankje
  • 563
  • 1
  • 7
  • 21
1
vote
0 answers

Compare two ffdf

I have two very large data sets (50M rows, 130 columns) which i can't compare with basic packages. Therefore i have to use an ffdf. It's the first time i am working with the ff package. I am trying to compare two ffdf and then write the differences…
Marvelous
  • 21
  • 1
1
vote
0 answers

How to replace specific values in an ffdf?

I want to replace all values within an ffdf (using ff package in R for large data). Typically, in a normal dataframe, I would use something like this: df[df>0] <- 1 Is there an analogous method for ffdf? I wish to keep the object as class "ffdf".
niafall
  • 164
  • 1
  • 1
  • 11
1
vote
1 answer

ff: returning multiple arrays with a single ffapply function call

I am dealing with a large dataset of 3D imaging data that I have loaded in to R using ff(). require(ff) nSubj <- 125 vol_dim <- c(139,137,87) ff_qmap <- ff(0, dim=c(vol_dim,nSubj) Simple calls like getting an average array/"volume" back work…
user10023347
  • 141
  • 1
  • 5
1 2
3
10 11