Questions tagged [ff]

An R package that provides memory-efficient storage of large data on disk and fast access functions

The ff package provides data structures that are stored on disk but behave (almost) as if they were in RAM by transparently mapping only a section (pagesize) in main memory.

More information:

165 questions
0
votes
1 answer

how to speed up checking duplication for huge ffdf

I have a list of ffdf, it takes up about 76GB of RAM if it is loaded to RAM instead of using ff package. The following is their respective dim() > ffdfs |> sapply(dim) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [1,]…
Chris LAM
  • 142
  • 1
  • 7
0
votes
1 answer

Error using ff package Error in ff(initdata = initdata, .. write error)

i am trying to load a csv file containing : 517 000 line, and only 20 variables, i am using read.table.ffdf, and it gives the error : Error in ff(initdata = initdata, length = length, levels = levels, ordered = ordered, : write error i used the…
math geek
  • 1
  • 1
0
votes
1 answer

Why the ffdf object is so large?

I use read.csv.ffdf from ff package to load a 830MB CSV file, which is about 8800000 rows and 19 columns:  library(ff) library(ffbase) green_2018_ff <- read.csv.ffdf("green_2018.csv", header = TRUE) But when I check the the size of green_2018_ff…
Kim.L
  • 121
  • 10
0
votes
1 answer

Set column types for csv with read.csv.ffdf

I am using a payments dataset from Austin Text Open Data. I am trying to load the data with the following code:- library(ff) asd <- read.table.ffdf(file = "~/Downloads/Fiscal_Year_2010_eCheckbook_Payments.csv", first.rows = 100, next.ros = 50, FUN =…
Shawn Brar
  • 1,346
  • 3
  • 17
0
votes
0 answers

How to pass ff objects to cluster/nodes

I have my data as ff arrays and stored perfectly on the disk, I'm trying to use parallel computation. Array's names are stored in the data frame DATA.wind in column Smoothed_name DATA.wind<-subset(DATA,DATA$variable=="u"|DATA$variable=="v") #…
ahmathelte
  • 559
  • 3
  • 15
0
votes
1 answer

Best way of parsing huge (10GB+) JSON files

I would like to know what is the best tool, IDE, programming language for parsing data stored as a json file. I trying pandas in python and ff in R and both of them either crash for memory issues or take too long to process. Do you have experience…
BlueMountain
  • 197
  • 2
  • 17
0
votes
3 answers

Separating time and date values from a timestamp/ DateTime column in an ffdf

I am a relatively new R user and this is my first question on StackOverflow, so apologies if my question is unclear or obviously stated somewhere else. I have a large dataset (7.8 GB, 137 million observations) that I have loaded into R in a ffdf…
arj
  • 25
  • 6
0
votes
1 answer

Basic example not working for ffwhich from the ffbase-package

I'm trying to use OHDSI:s version of the SelfControlledCaseSeries package, which utilizes the ff package to handle big data. But something is not working with the ffwhich function. Running the following example, provided in the ffwhich…
0
votes
0 answers

Memory problem with a data frame of 200k records

I got 2 matrices each with 200k records (one is a large get_sentences - review_phrases, the other is review_scores). Binded them in to a data frame and need to write it on a csv but get a memory error. What should i do? Do the packages bigmemory or…
The_Coder
  • 29
  • 5
0
votes
1 answer

How to do random sample from ff object

I want to extract the number of 1000 values from large size of ff object in R. I have tried sample_frac from dplyr package, but this results in error as below; Error: tbl must be a data frame, not a ffdf object How can I solve this problem?
imtaiky
  • 191
  • 1
  • 12
0
votes
1 answer

biglm - Error: $ operator is invalid for atomic vectors

I am trying to run a generalized linear model on a very large dataset (several million rows). R doesn't seem able to handle the analysis, however, as I keep getting memory allocation errors (unable to allocate vector of size...etc.). The data fit in…
itskj
  • 15
  • 4
0
votes
1 answer

read.csv.fdff error: cannot allocate vector of size 6607642.0 Gb

I need to read a 4.5GB csv file into RStudio, and to overcome the memory issue I use the read.ffdf function from the ff package. However, I still get an error message that the data is too big Error: cannot allocate vector of size 6607642.0 Gb and…
JL1118
  • 61
  • 3
0
votes
0 answers

unkown error when writing sql table from R

To print a sql table in SQL Server I wrote the following function: writeTableSql <- function(table, dat, tablename){ #libraries library(RODBC) library(ETLUtils) library(ff) dat[] <- lapply(dat, function(x) if (is.character(x)) as.factor(x) else…
Ariel
  • 157
  • 1
  • 4
  • 18
0
votes
1 answer

Dealing with real numbers wrapped in quotes in ff library for R

I'm trying to explore 2017 HMDA data. The flat file is about 9GB, available here. The CSV is too large to read into memory, so I tried using the ff library. However, I am getting errors when I try to read the file. > hmda.ff <- read.csv.ffdf(file =…
oatmilkyway
  • 429
  • 1
  • 6
  • 17
0
votes
1 answer

Kriging simulation using ff package

I'm trying to understand the way I can use the ff package to overcome the error "Error: cannot allocate vector of size 1.1 Mb" while using kriging/ gaussian simulation. I don't know how to change the input data. Is there any idea to help me do…
Mohammad
  • 67
  • 1
  • 9