Convert milliseconds data to daily data by aggregating corresponding values

Question

I need to aggregate the timestamp data from milliseconds to daily..

sample data:

System_ID  n_iter  value      Timestamp
1  xXXXXXXXxx    1      511 2010-05-01 03:29:37.21
2  xXXXXXXXxx    2      513 2010-05-01 03:52:07.56
3  xXXXXXXXxx    3      234 2010-05-01 05:13:54.24
4  xXXXXXXXxx    4      112 2010-05-01 05:41:37.42
5  xXXXXXXXxx    5      173 2010-05-01 06:53:19.01
6  xXXXXXXXxx    6      112 2010-05-01 07:02:53.62
7  xXXXXXXXxx    7      232 2010-05-01 07:35:33.86
8  xXXXXXXXxx    8      776 2010-05-01 07:50:08.14
9  xXXXXXXXxx    9      887 2010-05-01 08:32:55.5
10 xXXXXXXXxx    10     332 2010-05-01 08:56:44.96
11 xXXXXXXXxx    11     778 2010-05-01 09:17:22.57
12 xXXXXXXXxx    12     998 2010-05-01 10:16:26.57
13 xXXXXXXXxx    13     300 2010-05-02 03:33:40.4
14 xXXXXXXXxx    14     400 2010-05-02 03:58:26.44
15 xXXXXXXXxx    15     400 2010-05-02 04:00:23.98

I need to aggregate data that is there in milliseconds to daily in R. Output should looklike below:

System_ID  n_iter  value             Timestamp
xxxxxxxx    12    471.5 (Avg)         2010-05-01

Could you at least tag appropriate languages or technologies in which you want this to be done? — Mike Doe, Dec 11 '18 at 21:14
SO is not a code-writing service. Where is your attempt at the solution? — Parfait, Dec 11 '18 at 22:03
Dear Parfait, I tried with following code but not getting expected result.. library(data.table) z <- zoo(temp$value, as.numeric(temp$timestamp)) z.day <- aggregate(z, trunc, mean) z.day<-as.data.frame(z.day) setDT(z.day, keep.rownames = TRUE)[] z.day$date<-as.POSIXct(as.numeric(as.character(z.day$rn)),origin="2018-05-01",tz="GMT") — Coolsun, Dec 12 '18 at 07:06
How do I confirm or ratify that both the solutions below worked for me...please advise! — Coolsun, Dec 12 '18 at 07:33

score 1 · Accepted Answer · answered Dec 11 '18 at 22:05

I assume that you can read the Timestamp column to a R data.frame as a character at least (depending on how you read in the data it might be possible to do better). In such a setting you can get the example output using the dplyr package.

df <- data.frame(
  System_ID = rep('xXXXXXXXxx', 15)
  , n_iter = seq(1,15)
  , value = c(511, 513, 234, 112, 173, 112, 232, 778, 887, 332, 778, 998, 300, 400, 400)
  , Timestamp = c('2010-05-01 03:29:37.21', '2010-05-01 03:52:07.56', '2010-05-01 05:13:54.24', '2010-05-01 05:41:37.42', '2010-05-01 06:53:19.01', '2010-05-01 07:02:53.62', '2010-05-01 07:35:33.86', '2010-05-01 07:50:08.14', '2010-05-01 08:32:55.5', '2010-05-01 08:56:44.96', '2010-05-01 09:17:22.57', '2010-05-01 10:16:26.57', '2010-05-02 03:33:40.4', '2010-05-02 03:58:26.44', '2010-05-02 04:00:23.98')
) 

df %>%
  dplyr::mutate(
    Timestamp = as.Date(strptime(Timestamp, format = '%Y-%m-%d %H:%M:%OS'))
  ) %>%
  dplyr::group_by(System_ID, Timestamp) %>%
  dplyr::summarise(
    n_iter = max(n_iter)
    , value = mean(value)
  ) %>%
  dplyr::ungroup()

score 1 · Answer 2 · edited Nov 08 '20 at 09:40

Here is a solution:

First convert Timestamp to the POSIXct format (which is not necessary here) and create a day column.

 # convert Timestamp to POSIXct
 df$Timestamp <- as.POSIXct(df$Timestamp, format = "%Y-%m-%d %H:%M:%OS")
 # create a "day" column
 df$day <- format(df$Timestamp, "%Y-%m-%d")

Then, we retrieve the daily aggregates looping using vapply (here many other possibilities exist)

 # retrieve the required daily data
 t(vapply(seq_along(unique(df$day)), 
         function (k) c(unique(df$System_ID[df$day == unique(df$day)[k]]),    
                        max(df$n_iter[df$day == unique(df$day)[k]]),
                        mean(df$value[df$day == unique(df$day)[k]]),
                        unique(df$day)[k]),
         character(4)))

Finally, here is the result

      [,1]         [,2] [,3]               [,4]        
 [1,] "xXXXXXXXxx" "12" "471.666666666667" "2010-05-01"
 [2,] "xXXXXXXXxx" "15" "366.666666666667" "2010-05-02"

Note that it returns a character matrix as the function within vapply returns a vector (i.e. must be of one class). To have columns 2 and 3 of the result as numeric one either apply as.numeric or modify the function in vapply.

Of course, the above assumes:

There is only one System_ID (per day at least) and
The desired n_iter value is its daily max.

So bear that in mind as vapply might bet quite efficient and convenient, it can also be rigid.

Convert milliseconds data to daily data by aggregating corresponding values

2 Answers2