0

I have the script making a ffdf object:

library(ff)
library(ffbase)

setwd("D:/My_package/Personal/R/reading")
x<-cbind(rnorm(1:100000000),rnorm(1:100000000),1:100000000)
system.time(write.csv2(x,"test.csv",row.names=FALSE))

system.time(x <- read.csv2.ffdf(file="test.csv", header=TRUE,         first.rows=1000, next.rows=10000,levels=NULL)) 

Now I want to increase the column#1 of x by 5.
To perform such an operation I use method 'add()' of ff package:

add(x[,1],5)

The ouput is Ok (column#1 is increased by 5). But the extra RAM allocation is disasterous - it looks like as if I am operating the entire dataframe in RAM but not a ffdf object.

So my question is about the correct way to deal with elements of ffdf object without drastic extra RAM allocations.

Community
  • 1
  • 1
Dimon D.
  • 438
  • 5
  • 23

2 Answers2

0

I have used chunk approach to make arithmatic calculations without RAM extra overheads (see the initial script in the question section):

chunk_size<-100
m<-numeric(chunk_size)
chunks <- chunk(x, length.out=chunk_size)

system.time(
    for(i in seq_along(chunks)){
      x[chunks[[i]],][[1]]<-x[chunks[[i]],][[1]]+5
    }
)
x

Now, I have increased each element of the column#1 of x object by 5 without significant RAM allocations.

The 'chunk_size' regulates the number of chunks as well -> more chunks are used the smaller RAM overheads are. But processing time issues could arise.

The brief example and explanations about chunks in ffdf are here:
https://github.com/demydd/R-for-Big-Data/blob/master/09-ff.Rmd

Anyway, It would be nice to hear alternative approaches.

Dimon D.
  • 438
  • 5
  • 23
0

You can just do as follows

require(ffbase)

x <- ff(1:10)
y <- x + 5
x
y

ffbase has worked out all the Arithmetic operations see help("+.ff_vector")