-3

i have a time series dataset containing 10000 rows with 1 year of data which looks like this

 2012-01-01 06:23:02    c   d10
 2012-01-01 08:12:12    d   d2
 ...........................
 2012-12-31 08:22:24    s   d5

it has 3 fields

  date_time, category1, category2 where category1 contains 4 categorical values (c,v,d,s) category2 contains 10 categorical values(d1....d10).

i want to calculate the individual count of all the categorical values c,v,d,s with respect each categorical values d1......d10. it should be like how many c,v,d,s present for d1,d2....d10 with respect to the time frame 0-1, 1-2, .... 22-23

how to represent the above data in a time series starting from 1-2, 2-3, 3-4,.....23-24

sample output should be like this

               1-2               2-3   3-4   ........23-24

d1 c=2,d=3,v=3s=4

d2 c=3 d=3,v=2,s=2
..................

d10
i have tried using lubridate,data.table packages but couldn't find out the expected solution

mubarak
  • 77
  • 11

1 Answers1

0

Not clear about the expected result. May be this helps:

indx <- with(dat1, as.numeric(format(as.POSIXct(cut(date_time,
                                          breaks='hour')),'%H')))
dat1$indx1 <- interaction(indx, indx+1, sep="-", 
                           lex.order=TRUE, drop=TRUE)
dat1$date_time <- as.character(dat1$date_time)
library(reshape2)
res1 <- dcast(dat1, category1+category2~indx1, value.var='date_time')
res1[,-(1:2)] <- lapply(res1[,-(1:2)], as.POSIXct)

 head(res1,2)
 #  category1 category2  0-1                 1-2  2-3  3-4  4-5  5-6  6-7  7-8
 #1        c1        d1 <NA> 2012-01-03 01:43:02 <NA> <NA> <NA> <NA> <NA> <NA>
 #2        c1       d10 <NA>                <NA> <NA> <NA> <NA> <NA> <NA> <NA>
 #   8-9                9-10 10-11               11-12 12-13 13-14 14-15 15-16
 #1 <NA> 2012-01-01 09:13:02  <NA>                <NA>  <NA>  <NA>  <NA>  <NA>
 #2 <NA> 2012-01-02 09:43:02  <NA> 2012-01-02 11:03:02  <NA>  <NA>  <NA>  <NA>
 #  16-17 17-18 18-19 19-20 20-21 21-22 22-23 23-24
 #1  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>
 #2  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>

Update

If you want the counts

res2 <- dcast(dat1, category1+category2~indx1, value.var='date_time', length)
res2[1:3,1:3]
#   category1 category2 0-1
#1        c1        d1   0
#2        c1       d10   0
#3        c1       d11   0

data

set.seed(24)
dat1 <- data.frame(date_time=seq(as.POSIXct('2012-01-01 06:23:02', 
      '%Y-%m-%d %H:%M:%S'), length.out=300, by='10 min'), category1 =
 sample(paste0('c',1:20), 300, replace=TRUE), category2 = 
sample(paste0('d', 1:20), 300, replace=TRUE))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • thank u @akrun for the suggestion. but thats not the output i expected. i have modified the question for better understanding....pls help me – mubarak Nov 10 '14 at 12:53