1

I am trying to convert a standard (RAM) character vector to an ff object (vector). The code below returns an error:

> as.ff(c('a', 'b'))
Error in ff(initdata = initdata, length = length, levels = levels, ordered = ordered,: 
vmode 'character' not implemented

This thread (https://stackoverflow.com/questions/17744525/r-difficulties-facing-with-read-csv-ffdf-physicalmode-and-virtualmode) suggests that ff objects do not accept characters at all, only factors. Still, the below does not work:

> as.ff(c('a', 'b'), vmode = 'factor')
Error in ff(initdata = initdata, length = length, levels = levels, ordered = ordered,:
vmode 'factor' not implemented

The list below does not include 'factors':

.vimplemented
boolean   logical      quad    nibble      byte     ubyte     short    ushort 
 TRUE      TRUE      TRUE      TRUE      TRUE      TRUE      TRUE      TRUE 
integer    single    double   complex      raw  character 
 TRUE      TRUE      TRUE     FALSE      TRUE     FALSE 

So is it possible at all to create an ff vector of characters?

Community
  • 1
  • 1
Audrey
  • 212
  • 4
  • 15

2 Answers2

3

Curently, in ff, pure character vectors are not implemented. Factors are. As c('a','b') is a character, it will not work to convert it to ff. But it is of course possible to convert factors to ff.

require(ff)
class(c('a', 'b'))
[1] "character"
class(factor(c('a', 'b')))
[1] "factor"
as.ff(factor(c('a', 'b')))
ff (open) integer length=2 (2) levels: a b
[1] [2] 
  a   b 
class(as.ff(factor(c('a', 'b'))))
[1] "ff_vector" "ff" 

Mark also that the factor levels are in RAM. All the rest is on disk.

  • Thanks, that works. A related followup: while in read.csv.ffdf() I can define the colClasses in a similar fashion to the below without issues, `as.ffdf()` returns an error: ` > as.ffdf(data.frame( a=letters[1:5],b=1:5 ), colClasses = c('factor', 'numeric')) ` . Why? – Audrey Feb 20 '14 at 17:03
  • because `colClasses` is not an argument which can be supplied to `as.ffdf`. See the documentation of `as.ffdf`: `?as.ffdf` –  Feb 21 '14 at 10:13
  • So that means when you exit, start a new session and load the ffdf, the factor levels are all lost? – qed May 26 '15 at 15:43
  • No, the factor levels are part of what is saved when you ffsave or ffdfsave your ffdf. When you load the ffdf, you factor levels get into RAM, the real data (for factors these are integers) is not in RAM but remains on disk. So your RAM usage is limited to the factor levels. –  May 27 '15 at 07:48
1

Just call factor on your variable:

as.ff(factor(c('a', 'b')))
ff (open) integer length=2 (2) levels: a b
[1] [2] 
  a   b 

Internally, factors are integers,

storage.mode(factor(c('a', 'b')))
[1] "integer"

with a levels attribute that maps to the character representation. As you noted, integers are supported by ff.

James
  • 65,548
  • 14
  • 155
  • 193