Questions tagged [ff]

An R package that provides memory-efficient storage of large data on disk and fast access functions

The ff package provides data structures that are stored on disk but behave (almost) as if they were in RAM by transparently mapping only a section (pagesize) in main memory.

More information:

165 questions
1
vote
1 answer

How read a large dataset from .rar extension in R?

I have a data set that weighs 4GB compressed and more than 20GB uncompressed. The file can be downloaded here. I have tried several ways to load it and It have not been possible. There are similar questions in stackoverflow (question1, question2) I…
Henry Navarro
  • 943
  • 8
  • 34
1
vote
1 answer

Handling Big data with ff

I'm working with a dataset of 16Gb. This ofcourse is too large to load in the RAM memory so I need to use some sort of bigdata handeling method in R. My dataset consists of a lot of variables and most of them are character variables like names and…
1
vote
0 answers

Sorting an ffdf (ff package) in R language

I have a big file (size > 4GB) loaded into an ffdf. The file has a table structure where each line respects the pattern "string, \t, num, \t, num". How can I sort the ffdf by the first column? After the sorting I have to print the ffdf to another…
1
vote
0 answers

R Amending FF Factors

I am using Factors to handle character strings in FF. tt <- ff(factor(c("a","b","c")),names=c("c1","c2","c3")) tt ff (open) integer length=3 (3) levels: a b c c1 c2 c3 a b c When I try to change one of the data items I get the…
luca
  • 11
  • 2
1
vote
1 answer

Dealing with big datasets in R

I'm having a memory problem with R giving the Can not allocate vector of size XX Gb error message. I have a bunch of daily files (12784 days) in netcdf format giving sea surface temperature in a 1305x378 (longitude-latitude) grid. That gives 493290…
pacomet
  • 5,011
  • 12
  • 59
  • 111
1
vote
1 answer

One-to-many using ffbase in R

I would like to replicate the following one-to-many join using ffdf. What would be the best way to do this? Below I present an example of what I would like to get, using data.tables. I am aware of the following description of the merge.ffdf…
dleal
  • 2,244
  • 6
  • 27
  • 49
1
vote
1 answer

Reassigning values to columns in ffdf [R]

I am having trouble doing the following operations in a larger dataset. I wonder if there is a built in way to do it with either ff or ffdf. Example: Modifying a character columns in an ffdf object using substr and reassign it as a different…
dleal
  • 2,244
  • 6
  • 27
  • 49
1
vote
1 answer

Replacing numbers with letters in string

I have an ID column with names like "155AB3EA157A3466887D8F4B99BABC35". I want to replace the numbers in these strings with letters. I've tried using gsub, but it produces an "invalid text argument" error. My code looks like…
Erik
  • 73
  • 1
  • 8
1
vote
1 answer

Query SQL Server from R with ETLUtils for big tables

Normally to query a sql-server database from R, I'd use: library(RODBC) con <- odbcConnect(dsn = "ESTUDIOS", uid = "estudios", pwd = "yyyy") sql_trx <- "SELECT [Fecha], [IDServicio] FROM [ESTUDIOS].[dbo].[TRX] where MONTH(Fecha) =…
Ariel
  • 157
  • 1
  • 4
  • 18
1
vote
0 answers

how to use ff package for reading and combining rds files to resolve memory issue?

I have a list that contain several large files. All the files have the same column names. I want to combine them into an rds file and save with the filename in a column. I have read some of the SO answers and have written the following code. But I…
runjumpfly
  • 319
  • 1
  • 10
1
vote
1 answer

splitting ff_vectors in r

Is there a way to perform a basic split of ff_vectors without any sums or such things? I have a ffdf called res2 consisting of 2 ff_vectors, and I want the following, from a ffdf like this: A B a 1 a 2 b 4 b 5 result…
Centar 15
  • 127
  • 1
  • 13
1
vote
0 answers

Using lapply on a list of ff objects - R

I have a huge set of data that I have distributed on a list (33 years of hourly precipitation data for South America). Every object of the list is an ff array of data (one array per year; I have to use the ff package to avoid running out of RAM).…
JulianGiles
  • 328
  • 2
  • 7
1
vote
2 answers

getting average mean for each period in a ffdf object in R

I have an ffdf object called 'group1' that has a million rows of data that looks like this: Location DateandTime Reading Group 1 1 01/01/2012 00:00:00 0.8 1 2 1 …
Clover
  • 63
  • 5
1
vote
1 answer

vlookup method for ffdf object in R

I have an ffdf object called 'data' with over 26 million rows that looks like this: Location DateandTime Value 1 1 01/01/2012 00:00:00 0.8 2 42 01/01/2012…
Clover
  • 63
  • 5
1
vote
1 answer

Efficient way to calculate Mode of an mcmc.list object in R or How to convert a mcmc.list to a ff data type? (Handling Big data of type mcmc.list)

In R, I am using Rjags which calls JAGS to sample for a posterior distribution, which returns the samples in mcmc.list form. My aim is to take the Mode of each sampled variable(of the first chain) present in the mcmc.list form. When I load the…
Bongozil
  • 13
  • 6