1

For each group in a data.table, I want to repeat the value of the minimum (earliest) timestamp. Consider the following data:

library(chron)
library(data.table)
set.seed(12349870)
time.stamp<-chron(c(10000.673,sample(10001:20000,9)))
group<-c(rep(1,5),rep(2,5))
timedata<-data.table(time.stamp=time.stamp,group=group)
timedata

#   1: (05/19/97 16:09:07)     1
#   2: (03/02/21 00:00:00)     1
#   3: (02/20/15 00:00:00)     1
#   4: (12/11/10 00:00:00)     1
#   5: (08/23/10 00:00:00)     1
#   6: (07/22/18 00:00:00)     2
#   7: (06/09/23 00:00:00)     2
#   8: (03/02/13 00:00:00)     2
#   9: (06/04/09 00:00:00)     2
#   10: (12/04/12 00:00:00)     2

The following runs, but when I try to view the data.table, I get an error:

timedata[,firstdata:=time.stamp[which.min(time.stamp)],by=group]
timedata
#Error in format.dates(x, format[[1]], origin. = origin., simplify = simplify) :
#unknown date format 

Session info: R version 3.1.1, chron_2.3-45, data.table_1.9.2

brorgschlr
  • 407
  • 1
  • 4
  • 8
  • works fine for me (on 1.9.5, devel version). Probably try updating to 1.9.4, the current CRAN version? – Arun Dec 03 '14 at 19:14
  • @Arun Updated to data.table_1.9.4 and now my assignment by reference works. Thanks. – brorgschlr Dec 03 '14 at 19:20
  • Great! make sure you [read about the bug in automatic indexing in 1.9.4](http://stackoverflow.com/questions/26308072/operator-inconsistent-in-logical-columns-in-data-table), fixed in 1.9.5. You can turn the feature off by following [this comment](http://stackoverflow.com/questions/26308072/operator-inconsistent-in-logical-columns-in-data-table#comment41286824_26308820). – Arun Dec 03 '14 at 19:24

2 Answers2

0

Do you mean it like this?

stopifnot(sessionInfo()$otherPkgs$data.table$Version=="1.9.4")
timedata[,firstdata:=time.stamp[which.min(time.stamp)],by=group]
timedata
#            time.stamp group           firstdata
#1: (05/19/97 16:09:07)     1 (05/19/97 16:09:07)
#2: (03/02/21 00:00:00)     1 (05/19/97 16:09:07)
#3: (02/20/15 00:00:00)     1 (05/19/97 16:09:07)
#4: (12/11/10 00:00:00)     1 (05/19/97 16:09:07)
#5: (08/23/10 00:00:00)     1 (05/19/97 16:09:07)
#6: (07/22/18 00:00:00)     2 (06/04/09 00:00:00)
#7: (06/09/23 00:00:00)     2 (06/04/09 00:00:00)
#8: (03/02/13 00:00:00)     2 (06/04/09 00:00:00)
#9: (06/04/09 00:00:00)     2 (06/04/09 00:00:00)
#10:(12/04/12 00:00:00)     2 (06/04/09 00:00:00)
brorgschlr
  • 407
  • 1
  • 4
  • 8
DatamineR
  • 10,428
  • 3
  • 25
  • 45
  • Sorry, my original MWE named the time column "date", but I changed it to "time.stamp" to avoid confusion with the date function. Did you actually run the above? I get the same error. – brorgschlr Dec 03 '14 at 18:07
  • @Arun pointed out that I was using an older version of data.table. Updated to 1.9.4 and this works. – brorgschlr Dec 03 '14 at 19:26
0

The following does what I want, though I'd prefer assignment by reference as attempted in my question (e.g., I have to rename columns).

setkey(timedata,group,time.stamp)
timedata<-timedata[timedata[,.SD[1],keyby=group]]

changename<-function(dt,oldname,newname){
   nm<-names(dt)
   pos<-which(nm==oldname)
   stopifnot(length(pos)>0)
   nm[pos]<-newname
   setnames(dt,names(dt),nm)
}

changename(timedata,"time.stamp.1","firstdata")
timedata

#    group         time.stamp           firstdata
#1:     1 (05/19/97 16:09:07) (05/19/97 16:09:07)
#2:     1 (08/23/10 00:00:00) (05/19/97 16:09:07)
#3:     1 (12/11/10 00:00:00) (05/19/97 16:09:07)
#4:     1 (02/20/15 00:00:00) (05/19/97 16:09:07)
#5:     1 (03/02/21 00:00:00) (05/19/97 16:09:07)
#6:     2 (06/04/09 00:00:00) (06/04/09 00:00:00)
#7:     2 (12/04/12 00:00:00) (06/04/09 00:00:00)
#8:     2 (03/02/13 00:00:00) (06/04/09 00:00:00)
#9:     2 (07/22/18 00:00:00) (06/04/09 00:00:00)
#10:    2 (06/09/23 00:00:00) (06/04/09 00:00:00)
brorgschlr
  • 407
  • 1
  • 4
  • 8