Questions tagged [ff]

An R package that provides memory-efficient storage of large data on disk and fast access functions

The ff package provides data structures that are stored on disk but behave (almost) as if they were in RAM by transparently mapping only a section (pagesize) in main memory.

More information:

165 questions
2
votes
1 answer

My data.table join exceeds memory limit on 64bit Windows machine with 32gb RAM

Background I have some data manipulation to do on a very large data.table we'll call d1 (~125 million rows x 10 columns) I read into R from .csv form using fread. The data's about car insurance -- transmission-related and engine-related claims.…
logjammin
  • 1,121
  • 6
  • 21
2
votes
0 answers

How to convert a very large 40gb ffdf to a disk.frame?

Had it been smaller it would not have been difficult to use the as.data.table.ffdf function. But as it is, the file is much larger than my ram. Is there any way I can convert it or do I need to write it to disk and then reload it?
2
votes
0 answers

Reading very large CSV-files with many column in R

I am dealing with very large csv-files of 1-10 GB. I have figured out that I need to use the ff-package for reading in the data. However, this does not seem to work. I suspect that the problem is that I have approximately 73 000 columns and since ff…
Maria KA
  • 21
  • 2
2
votes
2 answers

Still struggling with handling large data set

I have been reading around on this website and haven't been able to find the exact answer. If it already exists, I apologize for the repost. I am working with data sets that are extremely large (600 million rows, 64 columns on a computer with 32 GB…
swads
  • 23
  • 3
2
votes
0 answers

How does multiple write access work with ff files?

We are using R 3.3.1 and the packages ff and Rserve under Win 7 to write to an ff file on some server from separate Rserve processes on different hosts. The Rserve processes are constantly performing tasks and are updating the central ff file at the…
tscherg
  • 1,032
  • 8
  • 22
2
votes
0 answers

How to compute this huge Correlation Matrix?

I have a huge matrix with nrow=144 and ncol=156267 containing numbers and I would like to compute the correlation between all the columns. This can be done using the bigcor function described here:…
NKGon
  • 55
  • 8
2
votes
1 answer

Efficient Combination and Operating on Large Data Frames

I have 2 relatively large data frames in R. I'm attempting to merge / find all combos, as efficiently as possible. The resulting df turns out to be huge (the length is dim(myDF1)[1]*dim(myDF2)[1]), so I'm attempting to implement a solution using ff.…
ch-pub
  • 1,664
  • 6
  • 29
  • 52
2
votes
1 answer

R - ff package : find the most frequent element in ffdf and delete the rows where is located

I need a suggestion to find the most frequent element in ffdf and after that to delete the rows where is located. I decided to try the ff package as I'm working with very big data and with base R I am running out of memory. Here is a little…
pshls
  • 137
  • 10
2
votes
1 answer

How to load 35 GB data into R?

I have a data set with the dimension of 20 million records and 50 columns. Now I want to load this data set into R. My machine RAM size is 8 GB and my data set size is 35 GB. I have to run my R code on complete data. So far I tried…
kondal
  • 161
  • 1
  • 8
2
votes
1 answer

Data.table setDT functionality in ff/ffbase R packages

Calculate column of conditional means, in ff/ffbase packages. I'm searching for functionality in ff/ffbase packages, which allow me for data manipulation similar to carried below with data.table package : library(data.table) irisdf <-…
Qbik
  • 5,885
  • 14
  • 62
  • 93
2
votes
0 answers

Working with Large Fixed Length Files in R using ff

I did some research and the ff package seems to have what I am looking for. However, I have no idea how to use it in my current scenario. Here's what I got: I have a fixed length file with no row terminator (all data in one line) Record length is…
2
votes
2 answers

reshaping a large data frame from wide to long in R

I've been through the various reshape questions but don't believe this iteration has been asked before. I am dealing with a data frame of 81K rows and 4188 variables. Variables 161:4188 are the measurements present as different variables. The idvar…
vagabond
  • 3,526
  • 5
  • 43
  • 76
2
votes
0 answers

assign values to a big matrix in R

I have a matrix object, named location, with three columns(ID, latitude, logitude) and 18,289 rows: # ID latitude longitude # 320503 31.29530 120.5735 # 310104 31.18852 121.4365 # 310115 31.22152 121.5444 …
chankey
  • 23
  • 3
2
votes
0 answers

R ffdfappend SIGBUS error

I have an R script which uses the ffbase and ff packages. In Windows the script runs fine. In Linux (different box, higher RAM though) it crashes with a bus (SIGBUS) error. Windows (Version 6.1.7601) session info: R version 3.1.0…
user444628
2
votes
0 answers

Performance Issues with DocumentTermMatrix

I am trying to create two Document Term Matrices like so: title_train <- DocumentTermMatrix(title_corpus_train, control = list(dictionary = title_dict)) title_test <- DocumentTermMatrix(title_corpus_test, control = list(dictionary =…
user1477388
  • 20,790
  • 32
  • 144
  • 264
1
2
3
10 11