0

I need to aggregate the timestamp data from milliseconds to daily..

sample data:

System_ID  n_iter  value      Timestamp
1  xXXXXXXXxx    1      511 2010-05-01 03:29:37.21
2  xXXXXXXXxx    2      513 2010-05-01 03:52:07.56
3  xXXXXXXXxx    3      234 2010-05-01 05:13:54.24
4  xXXXXXXXxx    4      112 2010-05-01 05:41:37.42
5  xXXXXXXXxx    5      173 2010-05-01 06:53:19.01
6  xXXXXXXXxx    6      112 2010-05-01 07:02:53.62
7  xXXXXXXXxx    7      232 2010-05-01 07:35:33.86
8  xXXXXXXXxx    8      776 2010-05-01 07:50:08.14
9  xXXXXXXXxx    9      887 2010-05-01 08:32:55.5
10 xXXXXXXXxx    10     332 2010-05-01 08:56:44.96
11 xXXXXXXXxx    11     778 2010-05-01 09:17:22.57
12 xXXXXXXXxx    12     998 2010-05-01 10:16:26.57
13 xXXXXXXXxx    13     300 2010-05-02 03:33:40.4
14 xXXXXXXXxx    14     400 2010-05-02 03:58:26.44
15 xXXXXXXXxx    15     400 2010-05-02 04:00:23.98

I need to aggregate data that is there in milliseconds to daily in R. Output should looklike below:

System_ID  n_iter  value             Timestamp
xxxxxxxx    12    471.5 (Avg)         2010-05-01
Coolsun
  • 189
  • 9
  • Could you at least tag appropriate languages or technologies in which you want this to be done? – Mike Doe Dec 11 '18 at 21:14
  • Which database? – Caius Jard Dec 11 '18 at 21:14
  • Sure, I need it in R please – Coolsun Dec 11 '18 at 21:18
  • SO is not a code-writing service. Where is your attempt at the solution? – Parfait Dec 11 '18 at 22:03
  • Dear Parfait, I tried with following code but not getting expected result.. library(data.table) z <- zoo(temp$value, as.numeric(temp$timestamp)) z.day <- aggregate(z, trunc, mean) z.day<-as.data.frame(z.day) setDT(z.day, keep.rownames = TRUE)[] z.day$date<-as.POSIXct(as.numeric(as.character(z.day$rn)),origin="2018-05-01",tz="GMT") – Coolsun Dec 12 '18 at 07:06
  • How do I confirm or ratify that both the solutions below worked for me...please advise! – Coolsun Dec 12 '18 at 07:33

2 Answers2

1

I assume that you can read the Timestamp column to a R data.frame as a character at least (depending on how you read in the data it might be possible to do better). In such a setting you can get the example output using the dplyr package.

df <- data.frame(
  System_ID = rep('xXXXXXXXxx', 15)
  , n_iter = seq(1,15)
  , value = c(511, 513, 234, 112, 173, 112, 232, 778, 887, 332, 778, 998, 300, 400, 400)
  , Timestamp = c('2010-05-01 03:29:37.21', '2010-05-01 03:52:07.56', '2010-05-01 05:13:54.24', '2010-05-01 05:41:37.42', '2010-05-01 06:53:19.01', '2010-05-01 07:02:53.62', '2010-05-01 07:35:33.86', '2010-05-01 07:50:08.14', '2010-05-01 08:32:55.5', '2010-05-01 08:56:44.96', '2010-05-01 09:17:22.57', '2010-05-01 10:16:26.57', '2010-05-02 03:33:40.4', '2010-05-02 03:58:26.44', '2010-05-02 04:00:23.98')
) 

df %>%
  dplyr::mutate(
    Timestamp = as.Date(strptime(Timestamp, format = '%Y-%m-%d %H:%M:%OS'))
  ) %>%
  dplyr::group_by(System_ID, Timestamp) %>%
  dplyr::summarise(
    n_iter = max(n_iter)
    , value = mean(value)
  ) %>%
  dplyr::ungroup()
Plamen Petrov
  • 317
  • 1
  • 5
1

Here is a solution:

  • First convert Timestamp to the POSIXct format (which is not necessary here) and create a day column.

     # convert Timestamp to POSIXct
     df$Timestamp <- as.POSIXct(df$Timestamp, format = "%Y-%m-%d %H:%M:%OS")
     # create a "day" column
     df$day <- format(df$Timestamp, "%Y-%m-%d")
    
  • Then, we retrieve the daily aggregates looping using vapply (here many other possibilities exist)

     # retrieve the required daily data
     t(vapply(seq_along(unique(df$day)), 
             function (k) c(unique(df$System_ID[df$day == unique(df$day)[k]]),    
                            max(df$n_iter[df$day == unique(df$day)[k]]),
                            mean(df$value[df$day == unique(df$day)[k]]),
                            unique(df$day)[k]),
             character(4)))
    
  • Finally, here is the result

          [,1]         [,2] [,3]               [,4]        
     [1,] "xXXXXXXXxx" "12" "471.666666666667" "2010-05-01"
     [2,] "xXXXXXXXxx" "15" "366.666666666667" "2010-05-02"
    

Note that it returns a character matrix as the function within vapply returns a vector (i.e. must be of one class). To have columns 2 and 3 of the result as numeric one either apply as.numeric or modify the function in vapply.


Of course, the above assumes:

  1. There is only one System_ID (per day at least) and
  2. The desired n_iter value is its daily max.

So bear that in mind as vapply might bet quite efficient and convenient, it can also be rigid.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
niko
  • 5,253
  • 1
  • 12
  • 32