Questions tagged [ff]

An R package that provides memory-efficient storage of large data on disk and fast access functions

The ff package provides data structures that are stored on disk but behave (almost) as if they were in RAM by transparently mapping only a section (pagesize) in main memory.

More information:

165 questions
0
votes
1 answer

How to subset ffdf by index?

I would like to subset an ffdf object by index, returning another ffdf object. The help file on subset.ffdf indicates that you can pass a range index (ri) object as an argument, but when I tried: data_subset <- subset.ffdf(data, ri(1, 1e5)) I got…
travis
  • 5
  • 3
0
votes
0 answers

R ff package : 2Gb limit?

I have a dataset with 5G lines, too big to import as-is in R-base. My understanding is that this limit arises from the use of 32-bit indexes on vectors. As a result, vectors up to 2^31 - 1 are allowed, even in a 64bits version of R. So I am…
Olivier Delrieu
  • 742
  • 6
  • 16
0
votes
1 answer

Issue using ff with SVM function in library(e1071)

I am trying to use a ff object to run a svm classification study. I converted my dataframe to a ff object using ffdf <- as.ffdf(signalDF). The dataset has 1024 columns and ~ 600K rows. When I run the function, svm(Y~.,…
John Smith
  • 51
  • 6
0
votes
0 answers

Error: cannot allocate vector of size 61.6 Gb In addition: Warning messages: 1: Reached total allocation of 8191Mb:

I get this error when i tried to use the data which has 4 datetime columns. I tried using ff package but it again leads to the errors which changes the datatype ofthe dependent variable in my linear regression model. Please let me know how to use…
VIJU
  • 5
  • 3
0
votes
2 answers

Handeling large datasets in R

I'm working on a relatively large datasets (5 files 2GB each to give you an order of magnitude one of the tables is 1.5M rows x 270columns), where I use dplyr left_joint funtion (between these datasets and other small tables). The tables contain…
nidabdella
  • 811
  • 8
  • 24
0
votes
1 answer

How can you generate a large empty (zeros) numeric ffdf in R?

Let's say that I am trying to generate a large empty matrix of zeros that I can fill from the data (e.g. count data) in the package ff require(ff) require(ffdf) If there are 15,000 columns (variables) and 20 rows (observations), I could do the…
Brian Jackson
  • 409
  • 1
  • 5
  • 16
0
votes
0 answers

Handling a big matrix in R

I have a very large Term Document Matrix in R with dimensions 81094 * 14177. On trying to convert it to a normal matrix I am getting an error Error: cannot allocate vector of size 8.6 Gb The code that I have used is new_matrix =…
NinjaR
  • 621
  • 6
  • 22
0
votes
1 answer

ffbase: merge on columns X and Y and closest column Z

I would like to accomplish the following using ffdf: Merge on columns X and Y and closest Time and then merge on the closes column B. However,the procedure that I know in smaller samples involves using outer merges (as shown below). What is a way…
dleal
  • 2,244
  • 6
  • 27
  • 49
0
votes
3 answers

Efficient way to read file larger than memory in R

This reference https://www.r-bloggers.com/efficiency-of-importing-large-csv-files-in-r/ compares reading a file using fread versus ffdf. I am currently trying to read a csv file that is abour 60GB while my memory available on RAM is 16GB. It takes…
dleal
  • 2,244
  • 6
  • 27
  • 49
0
votes
0 answers

File 135M rows, 22 cols has a random head note in the first 4 lines followed by column headers. How do I skip the first few lines in Laf Package in R

I am using the LaF package to read this large file with 135M rows and 22 Cols ~ 15 GB of raw data, pipe delimited. The raw file unfortunately has a random head notes in the first 4 lines followed by column headers. Edit: I am sorry I should have…
SatZ
  • 430
  • 5
  • 14
0
votes
0 answers

Raster processing 8GB more in R

I am currently using a code in R for calculating heat stroke , but it has been impossible for me to run more than 100 MB . The code operates a small raster elevation model but it has been impossible for me to use larger raster (raster DEM Colombia…
0
votes
2 answers

Parsing as.transactions in r

I've been working on rewriting my code that worked with data.frames to work with ffdf. I had two columns, and after a lot of fuss I've managed to do a split and get a list with the following look: data= $A 1 2 3 $B 4 5 6 where A,B are the "baskets"…
Centar 15
  • 127
  • 1
  • 13
0
votes
0 answers

ff package read a text file wrong

I would like to read a large text file seperated by "|". So I used the below code. sampleData <- read.table(file = '2013_4MM01_7-11_CD.txt',header =TRUE, sep = '|', nrows=10) pos<- read.table.ffdf(file="2013_4MM01_7-11_CD.txt", header=TRUE,…
Daeseong
  • 1
  • 1
0
votes
0 answers

Row limit in read.table.ffdf?

I'm trying to import a very large dataset (101 GB) from a text file using read.table.ffdf in package ff. The dataset has >285 million records, but I am only able to read in the first 169,457,332 rows. The dataset is tab-separated with 44…
Michel
  • 1
  • 1
0
votes
0 answers

not all RAM is released after gc() after using ffdf object in R

I am running the script as follows: library(ff) library(ffbase) setwd("D:/My_package/Personal/R/reading") x<-cbind(rnorm(1:100000000),rnorm(1:100000000),1:100000000) system.time(write.csv2(x,"test.csv",row.names=FALSE)) #make ffdf object with…
Dimon D.
  • 438
  • 5
  • 23