1

Hi I am trying to remove duplicates from a ff vector that contains dates using the duplicated function of the ffbase package and the following code:

v1 <- c("24-Mar-94", "24-Mar-94", "27-Mar-94", "28-Jun-1986", "29-Jul-1988", "28-Jun-1986", "15-Jan-1999", "13-Jan-1999")
v1.d <- as.Date(v1, format="%d-%b-%y")
v1.ff <- as.ff(v1.d)
v2 <- v1.ff[!duplicated(v1.ff)]

However I get the following error:

Error in UseMethod("as.hi") : 
  no applicable method for 'as.hi' applied to an object of class "Date"

Is there any way around this problem without having to coerce the v2 vector to a ram object first?

NickD1
  • 393
  • 1
  • 4
  • 14

2 Answers2

1

Try this:

library(ff)
v1 <- c("24-Mar-94", "24-Mar-94", "27-Mar-94", "28-Jun-1986", "29-Jul-1988", "28-Jun-1986", "15-Jan-1999", "13-Jan-1999")
v1.d <- as.Date(v1, format="%d-%b-%y")
v1.ff <- as.ff(v1.d)
v2 <- v1.ff[ !duplicated(v1.ff[,])  ]

Output:

> v1.d
[1] "1994-03-24" "1994-03-24" "1994-03-27" "2019-06-28" "2019-07-29" "2019-06-28" "2019-01-15" "2019-01-13"
> v2
[1] "1994-03-24" "1994-03-27" "2019-06-28" "2019-07-29" "2019-01-15" "2019-01-13"

And it's deduped.

ff objects need to be particularly sliced/subset in order to appropriately use functions on them. One of the ways is the above using [,] or [] (for this one since it's a vector) in order to create a vector with all of the elements and then use duplicated on it.

LyzandeR
  • 37,047
  • 12
  • 77
  • 87
  • Also, I find [this](http://cran.r-project.org/web/packages/ff/ff.pdf) very helpful to find out a few details about ff if you want. – LyzandeR Dec 23 '14 at 12:43
0

Use sapply to only get the unique dates first then cast to the ffbase object

library(ffbase)
v1 <- c("24-Mar-94", "24-Mar-94", "27-Mar-94", "28-Jun-1986", "29-Jul-1988", "28-Jun-1986", "15-Jan-1999", "13-Jan-1999")
v1<-sapply(v1,unique)
v1.d <- as.Date(v1, format="%d-%b-%y")
v1.ff <- as.ff(v1.d)
  • hi, thanks for your response but sorry for not being entirely clear: this code is a "concise example" of what I am trying to do, and the first two lines are just serving for this purpose. So assume that v1.ff is a very large vector, created by ffload. So I would like to work just with the ff vector, and not with ram copies of it if possible. – NickD1 Dec 23 '14 at 11:46