1

I have a data frame for every second. From the data frame with 1 second interval, I managed to aggregate the data into 1-minute interval using the following code:

agg_cont <- df %>% group_by(Date, Hour, Minute, Status, Mean) %>% count(name = 'Occurrence')

Now I have a data frame as shown below,

Date Hour Minute Status Mean Occurrence
12/01/2022 00 00 a 20 60
12/02/2022 00 01 b 32 60
12/01/2022 00 02 a 21 60
12/02/2022 00 03 a 12 60
12/01/2022 00 04 a 23 20
12/01/2022 00 04 b 43 40
12/01/2022 00 05 a 33 60

Please note that the column 'Occurrence' denotes the number of seconds the status occurred in the specific minute. For the minute '04', if the occurrence for status 'a' and 'b' is 20 and 40, respectively, then status 'a' occurred for 20 sec in the particular minute.

With the above dataframe, I would like to have only one row for each minute and create new columns for each 'status' and the mean value that occurred in that particular minute.

Desired output:

Date Hour Minute a b Mean 'a' Mean'b'
12/01/2022 00 00 60 0 20 NA
12/02/2022 00 01 0 60 32 NA
12/01/2022 00 02 60 0 21 NA
12/01/2022 00 03 0 60 12 NA
12/01/2022 00 04 20 40 23 43
12/02/2022 00 05 60 0 33 NA

I am trying to use the dcast function to get the desired output.

Thanks

Karthik
  • 117
  • 7

2 Answers2

2

Here is a tidyverse way.

agg_count <- read.table(text = "
Date    Hour    Minute  Status  Mean    Occurrence
12/01/2022  00  00  a   20  60
12/02/2022  00  01  b   32  60
12/01/2022  00  02  a   21  60
12/02/2022  00  03  a   12  60
12/01/2022  00  04  a   23  20
12/02/2022  00  04  b   43  40
12/02/2022  00  05  a   33  60
", header = TRUE)

suppressPackageStartupMessages({
  library(dplyr)
  library(tidyr)
})

agg_count %>%
  pivot_wider(
    id_cols = c(Date, Hour, Minute),
    names_from = Status,
    values_from = c(Occurrence, Mean),
    values_fill = 0L
  )
#> # A tibble: 7 × 7
#>   Date        Hour Minute Occurrence_a Occurrence_b Mean_a Mean_b
#>   <chr>      <int>  <int>        <int>        <int>  <int>  <int>
#> 1 12/01/2022     0      0           60            0     20      0
#> 2 12/02/2022     0      1            0           60      0     32
#> 3 12/01/2022     0      2           60            0     21      0
#> 4 12/02/2022     0      3           60            0     12      0
#> 5 12/01/2022     0      4           20            0     23      0
#> 6 12/02/2022     0      4            0           40      0     43
#> 7 12/02/2022     0      5           60            0     33      0

Created on 2023-03-27 with reprex v2.0.2


As I asked in comment to the question, if the date is not to be considered, then the code below will put input rows with equal minutes in the same output rows.

agg_count %>%
  pivot_wider(
    id_cols = c(Hour, Minute),
    names_from = Status,
    values_from = c(Occurrence, Mean),
    values_fill = 0L
  )
#> # A tibble: 6 × 6
#>    Hour Minute Occurrence_a Occurrence_b Mean_a Mean_b
#>   <int>  <int>        <int>        <int>  <int>  <int>
#> 1     0      0           60            0     20      0
#> 2     0      1            0           60      0     32
#> 3     0      2           60            0     21      0
#> 4     0      3           60            0     12      0
#> 5     0      4           20           40     23     43
#> 6     0      5           60            0     33      0

Created on 2023-03-27 with reprex v2.0.2

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • Many thanks for your quick help. I am getting the following error "Error in `pivot_wider_spec()`: ! Can't convert `fill` to . " When I tried your example, it works perfectly. But upon applying the code to my dataset, I get the above error – Karthik Mar 27 '23 at 20:20
  • I added "values_fn=mean" to the code you suggested and got the results. Many thanks for your help. – Karthik Mar 27 '23 at 20:54
0

Updated answer:

 agg_count %>%
   pivot_wider(
     id_cols = c(Date, Hour, Minute),
     names_from = Status,
     values_from = c(Occurrence, Mean),
     values_fn = sum,
     values_fill = 0L  
   )
Karthik
  • 117
  • 7