-1

I have a data set in the following format

ID  DATETIME          VALUE  
1   4/2/2012 10:00      300  
1   5/2/2012 23:00      150  
1   6/3/2012 10:00      650  
2   1/2/2012 10:00      450  
2   2/2/2012 13:00      240  
3   6/5/2012 09:00      340  
3   7/5/2012 23:00      240

I would like to first calculate the time difference from first instance per ID to each subsequent time.

ID  DATETIME          VALUE  DIFTIME(days)  
1   4/2/2012 10:00      300   0  
1   5/2/2012 23:00      150   1.3  
1   6/3/2012 10:00      650   33  
2   1/2/2012 10:00      450   0  
2   2/2/2012 13:00      240   1  
3   6/5/2012 09:00      340   0  
3   7/5/2012 23:00      240   1  

And then I'd like to make this a wide format

ID   0    1    1.3    33  
1    300  na  150  na  650  
2   450  240  na   na   
3   340  240 na na 
agstudy
  • 119,832
  • 17
  • 199
  • 261
Matthew
  • 19
  • 1
  • 1
    (I don't vote) I guess you have been downvoted because 1- you don't give your data in an easy format that others can use and you don't show any effort to resolve the problem. – agstudy Jun 28 '13 at 00:59
  • thanks for letting me know - I will try and improve the data formatting – Matthew Jun 28 '13 at 01:13
  • You can just use `dput(head(dat))` , and you get a structure like mine below(see my answer) – agstudy Jun 28 '13 at 01:16

1 Answers1

1

Here a solution using data.table and reshape2 packages:

library(data.table)
DT <- as.data.table(dat)
DT[, `:=`(DIFTIME, c(0, diff(as.Date(DATETIME)))), by = "ID"]
##    ID VALUE            DATETIME DIFTIME
## 1:  1   300 2012-02-04 10:00:00       0
## 2:  1   150 2012-02-05 23:00:00       1
## 3:  1   650 2012-03-06 10:00:00      30
## 4:  2   450 2012-02-01 10:00:00       0
## 5:  2   240 2012-02-02 13:00:00       1
## 6:  3   340 2012-05-06 09:00:00       0
## 7:  3   240 2012-05-07 23:00:00       1

library(reshape2)

dcast(formula = ID ~ DIFTIME, data = DT[, list(ID, DIFTIME, VALUE)])
##   ID   0   1  30
## 1  1 300 150 650
## 2  2 450 240  NA
## 3  3 340 240  NA

data in handy format

Here my dat:

structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 3L, 3L), DATETIME = structure(c(1328346000, 
1328479200, 1331024400, 1328086800, 1328184000, 1336287600, 1336424400
), class = c("POSIXct", "POSIXt"), tzone = ""), VALUE = c(300L, 
150L, 650L, 450L, 240L, 340L, 240L)), .Names = c("ID", "DATETIME", 
"VALUE"), class = "data.frame", row.names = c(NA, 7L))
agstudy
  • 119,832
  • 17
  • 199
  • 261
  • unable to replicate this answer - I think because I have non-unique values in the VALUE column - my sample data – Matthew Jul 08 '13 at 00:00
  • structure(list(ID = c(1L, 1L, 2L, 2L, 3L, 3L), DATETIME = structure(c(1334187600, 1334614800, 1371419340, 1371680820, 1333579740, 1334528400), class = c("POSIXct", "POSIXt"), tzone = ""), VALUE = c(600L, 3600L, 850L, 5050L, 2850L, 2850L)), .Names = c("ID", "DATETIME", "VALUE"), row.names = c(NA, 6L), class = "data.frame") – Matthew Jul 08 '13 at 00:01
  • any help greatly appreciated - the above solution from agstudy worked if all the VALUE were unique - I get the dcast error "Aggregation function missing: defaulting to length" and lose the values of VALUE – Matthew Jul 08 '13 at 00:02