0

Error vmode 'character' not implemented occours due to argument colClasses=c("id"="character") in below code :

df <- read.csv.ffdf('TenGBsample.csv',
      colClasses=c("id"="character"), VERBOSE=TRUE)

read.table.ffdf 1..1000 (1000) csv-read=0.02secError in ff(initdata = initdata, length = length, levels = levels, ordered = ordered, :
vmode 'character' not implemented

where first column in TenGBsample.csv is 'id' and consist of 30 digit numbers, which exceeds maximum number on my 64-bit system (Windows), I would like to handle them as character, second column contain small numbers, so there is no need for adjustment.

I've checked, and there is 'character' mode for vmode : http://127.0.0.1:16624/library/ff/html/vmode.html

Qbik
  • 5,885
  • 14
  • 62
  • 93

1 Answers1

1

Note the following from help(read.csv.ffdf)

... read.table.ffdf has been designed to behave as much like read.table as possible. However, note the following differences:

  1. character vectors are not supported, character data must be read as one of the following colClasses: 'Date', 'POSIXct', 'factor, 'ordered'. By default character columns are read as factors. Accordingly arguments 'as.is' and 'stringsAsFactors' are not allowed.

So you cannot read the value in as character. But if you already have numeric values for the id column in the file, then you could read them in as doubles and re-format them afterward. format(x, scientific = FALSE) will print x in standard notation.

Here's an example data set x where id is numeric and has 30 digits.

library(ff)

x <- data.frame(
    id = (267^12 + (102:106)^12),  
    other = paste0(LETTERS[1:5],letters[1:5])
)
## create a csv file with 'x'
csvfile <- tempPathFile(path = getOption("fftempdir"), extension = "csv")
write.csv(
    format(x, scientific = FALSE), 
    file = csvfile, row.names = FALSE, quote = 2
)    
## read in the data without colClasses
ffx <- read.csv.ffdf(file = csvfile)
vmode(ffx)
#       id     other 
# "double" "integer" 

Now we can coerce ffx to class data.frame with ffx[,] and re-format the id column.

df <- within(ffx[,], id <- format(id, scientific = FALSE))
class(df$id)
# [1] "character"
df
#                               id other
# 1 131262095302921040298042720256    Aa
# 2 131262252822013319483345600512    Bb
# 3 131262428093345052649582493696    Cc
# 4 131262622917452503293152460800    Dd
# 5 131262839257598318815163187200    Ee
Rich Scriven
  • 97,041
  • 11
  • 181
  • 245