Questions tagged [ff]

An R package that provides memory-efficient storage of large data on disk and fast access functions

The ff package provides data structures that are stored on disk but behave (almost) as if they were in RAM by transparently mapping only a section (pagesize) in main memory.

More information:

165 questions
1
vote
1 answer

Splitting an ffdf object

I'm using ff and ffbase libraries to manage a big csv file (~40Go and 275e6 observations). I'd like to split/partition this file according to one of its columns (which is a factor column). With a normal data frame, I would do something like that: a…
1
vote
0 answers

R weird error when creating large ff objects

I'm trying to create some large ff objects. According to the documentation, this shouldn't be an issue: ff <- ff(3, dim=c(10000, 4000, 70), filename="test1.ff", vmode="single") This gives me the little helpful error: Error in ff(3, dim = c(10000,…
1
vote
1 answer

R ff, how to add new column/row to existing FF object

Many times I walk into this: I already have a large ff object(represented by a matrix/array) and then I want to add a new column/row to it, as I have some updated data and don't want to create a new big object from scratch (which can be very time…
1
vote
1 answer

log2 transform ff objects

I would like to log2 transform all numeric values in a ff object from ffpackage. Using my df: library(ff) df <- 'probeset_id sample1 sample2 sample3 probe_1 1834.2 1743.4 1384 probe_2 4711 4922 4650 probe_3…
user2120870
  • 869
  • 4
  • 16
1
vote
1 answer

Drop columns from ff objects

I would like to drop a column from ff object: Input file file.txt is tab delimited like that: Col1 Col2 Col2 x1 x1 x1 x2 x2 x3 x3 x4 xh Then reading with ff package: library(ff) df <- read.table.ffdf("file.txt", header=T,…
user2120870
  • 869
  • 4
  • 16
1
vote
1 answer

Adding a column with character data to a ffdf

I've tried to add a Source column to my ffdf, but can't seem to get it to work... if it was a normal df I would simply write mtcars$NewCol <- "AB" If I do this for the ffdf it returns an error require(ff) require(ffbase) mtcarsff <-…
Jacob Odom
  • 216
  • 1
  • 8
1
vote
1 answer

Using apply function with ff package in R

I am trying to apply a given function to the columns in a "as.ffdf" object but I haven't had any luck. Can anyone provide suggestions to the below? n = 3711 and myProbDensity has dimensions of 95248 rows and 3711 columns. myDF <-…
user5087936
1
vote
2 answers

Reading very large fixed(ish) width format txt files from SQL Server Export into R data.tables or likewise

I'm trying to read in (and eventually merge/link/manipulate) a series of large (~300M) and very large (~4G) fixed width files for eventual regressions, visualizations, etc., and am hitting some snags. First, the format of the files themselves is odd…
Mike Dolan Fliss
  • 217
  • 2
  • 11
1
vote
1 answer

Error writing large matrix using R ff

I've tried to generate a matrix using ff package, but I get the following error: Matrixff <- ff(0, dim = c(1000, 10000)) Error in splitPathFile(x) : 4 arguments passed to .Internal(nchar) which requires 3 How can I solve that?
Israel
  • 260
  • 3
  • 15
1
vote
0 answers

Routine matrix functions in ff

I am new to dealing with big matrices in R. I am trying to learn with ff. I can create large ff matrices ffsdist1 and ffsdist2 as follows. library(stringdist) library(babynames) library(ff) d <- babynames sdist1 <- stringdistmatrix(d$name[1:1000],…
Crops
  • 5,024
  • 5
  • 38
  • 65
1
vote
1 answer

Why summarise in ffbase2 (dplyr_ffbase) shows "error in as.vmode.default() (list) object cannot be coerced to type 'double'"?

I have a large (23 Mln rows) ffdf table (tbl_ffdf) with 10 columns, 7 of them are factors and 3 contain numbers. It looks something like this: TABLE_bad F1 F2 F3 F4 F5 F6 F7 N1 N2 N3 1111 01.15 05.14 busns…
inscaven
  • 2,514
  • 19
  • 29
1
vote
1 answer

How to work with a large multi type data frame in Snow R?

I have a large data.frame of 20M lines. This data frame is not only numeric, there is characters as well. Using a split and conquer concept, I want to split this data frame to be executed in a parallel way using snow package (parLapply function,…
1
vote
1 answer

ff Package in R: is "no diskspace" error a permissions issue?

I'm using the ff package in RStudio, which is running on a Windows server in my department. I'm using it to work with some large datasets, which I'm also storing on a network drive. I have confirmed that I have full read/write access to the drive…
thetanman6
  • 23
  • 3
1
vote
0 answers

How to delete (or select) specified rows or columns of ff matrix, or to subset ff matrix?

A ff matrix of 300,000 rows and 1000 columns: x <- ff(1: 100000000, vmode = "integer", dim = c (300000, 1000), dimorder = c (2,1)) I want to delete the last line of the matrix use the command: x[-300000,] However,I got the error: "can not allocate…
1
vote
2 answers

duplicated function fails for ff date vectors

Hi I am trying to remove duplicates from a ff vector that contains dates using the duplicated function of the ffbase package and the following code: v1 <- c("24-Mar-94", "24-Mar-94", "27-Mar-94", "28-Jun-1986", "29-Jul-1988", "28-Jun-1986",…
NickD1
  • 393
  • 1
  • 4
  • 14