0

I would like to subset an ffdf object by index, returning another ffdf object.

The help file on subset.ffdf indicates that you can pass a range index (ri) object as an argument, but when I tried:

data_subset <- subset.ffdf(data, ri(1, 1e5))

I got this error:

Error in which(eval(e, nl, envir)) : argument to 'which' is not logical

Per You-Leee's suggestion, I tried passing a logical vector of the index of interest with this code:

n <- length(data[[1]]) #10.5 million
logical_index = c(1, 1e5) == seq.int(1, n)
data_subset <- subset(data, logical_index)

I tried to run it twice and each time my R-Studio crashed with the message R encountered a fatal error. The session was terminated. At first I thought it might be a memory constraint, but looking at my activity monitor, I still have 4gb available out of 8gb. And besides, this shouldn't be loading much into memory anyway.

travis
  • 5
  • 3

1 Answers1

0

The argument has to be logical, so you have to put TRUE on the desired indices and FALSE otherwise:

> data <- ffdf(a = ff(1:12))
> subset.ffdf(data, c(1, 1e5) == seq.int(1, length(data$a)))
ffdf (all open) dim=c(1,1), dimorder=c(1,2) row.names=NULL
ffdf virtual mapping
  PhysicalName VirtualVmode PhysicalVmode  AsIs VirtualIsMatrix     PhysicalIsMatrix
a            a      integer       integer FALSE           FALSE                FALSE
  PhysicalElementNo PhysicalFirstCol PhysicalLastCol PhysicalIsOpen
a                 1                1               1           TRUE
ffdf data
  a
1 1
You-leee
  • 550
  • 3
  • 7
  • Thank you for answering, You-leee! Please check out my edit to the question. – travis Nov 16 '17 at 08:34
  • I think this calls for a new question, because of the fatal error. Since there is no other error message, I would start by debugging the subset function: '> debug(subset.ffdf)' and then get as deep in the function calls as possible to determine, where the fatal error is thrown. – You-leee Nov 16 '17 at 18:35