6

I have CO2 measurement data by 30 sensors that don't all measure at the same time, nor do they all start at exactly the same time. I would like to align them as best as possible, so I thought that taking 10s averages might be a good solution.

In a previous question: Group by multiple variables and summarise dplyr I cut the time in 10s chunks for each sensor and averaged each sensors reading over those 10s. Sounds OK but, what I've realised is that the following code cuts the time from whatever time each sensor starts at and therefore they are still not aligned. How can I align them?

require(tidyverse)
require(lubridate)  
  df %>%
        group_by(Sensor, BinnedTime = cut(DeviceTime, breaks="10 sec")) %>%
            mutate(Concentration = mean(calCO2)) %>%
        ungroup()

    head(df)
# A tibble: 6 x 7
# Groups: BinnedTime [1]

  Sensor Date       Time   calCO2 DeviceTime          cuts   BinnedTime         
  <fctr> <date>     <time>  <dbl> <dttm>              <fctr> <chr>              
1 N1     2019-02-12 13:24     400 2019-02-12 13:24:02 (0,10] 2019-02-12 13:24:02
2 N1     2019-02-12 13:24     400 2019-02-12 13:24:02 (0,10] 2019-02-12 13:24:02
3 N1     2019-02-12 13:24     400 2019-02-12 13:24:03 (0,10] 2019-02-12 13:24:03
4 N2     2019-02-12 13:24     400 2019-02-12 13:24:03 (0,10] 2019-02-12 13:24:02
5 N3     2019-02-12 13:24     400 2019-02-12 13:24:03 (0,10] 2019-02-12 13:24:02
6 N3     2019-02-12 13:24     400 2019-02-12 13:24:05 (0,10] 2019-02-12 13:24:04

Edit

I've tried:

dt<-seq(
  from=as.POSIXct("2019-02-12 13:24:00", tz="GMT"),
  to=as.POSIXct("2019-02-12 14:00:00", tz="GMT"),
  by="10 sec"
) 

cut(df$BinnedTime,dt)

but it gives an error saying x must be numeric, so I converted both df$BinnedTime and dt$dt to numeric and this produces only NAs.

cut(as.numeric(as.POSIXct(df$BinnedTime)), as.numeric(dt))

What am I missing?

Edit 2

I have the following:

df$DeviceTime <- as.POSIXct(paste(d$Date, d$Time), format="%Y-%m-%d %H:%M:%S")

df<-df%>%
  mutate(BinnedTime=floor_date(ymd_hms(DeviceTime),unit="10 sec"))%>%
  group_by(Sensor)%>%
  group_by(BinnedTime,add=TRUE)%>%
  summarize(calCO2 = mean(na.omit(calCO2)))

Which I think is now what I'm after but it's not elegant.

Here is the data file in onedrive: df.txt until 30th March 19

HCAI
  • 2,213
  • 8
  • 33
  • 65
  • 1
    It seems `cut.Date` allows you to specify your cut points explicitly: from the help page `?cut.Date`: *breaks: a vector of cut points or number giving the number of intervals which x is to be cut into or an interval specification...*. Note: `cut.Date` is the function being dispatched by `cut` when `DeviceTime` is an object inheriting from class "POSIXt" or "Date". So maybe you could give it the cut points explicitly as a vector instead of using "10 sec"... – Ramiro Magno Feb 26 '19 at 11:32
  • 1
    Thanks. I've created a vector of 10s time intervals between 13:24:00 and 14:00:00 using seq. and then tried to use cut without success. Any thoughts? dt<-seq( from=as.POSIXct("2019-02-12 13:24:00", tz="GMT"), to=as.POSIXct("2019-02-12 14:00:00", tz="GMT"), by="10 sec" ) cut(df$BinnedTime,breaks=dt) – HCAI Feb 26 '19 at 12:13
  • Can you share your dataframe `df`: `dput(df)`? It would be easier to help you. – Ramiro Magno Feb 26 '19 at 13:28
  • I've included a link in the 2nd edit. I think I've managed to code it to do what I'm after but I'm sure there's a more elegant way. Also I've noticed Sensor=A15 has 215 values after running the dplyr code not 216 like all the others. aggregate(data=df,calCO2~Sensor,NROW). I think it's because I used na.omit in calculating the mean. I'll have to figure this out. – HCAI Feb 26 '19 at 13:52

1 Answers1

2
library(tidyverse)
library(lubridate)  
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date

df <- read_delim("https://gist.githubusercontent.com/ramiromagno/4347eefec2aa36ec94423b75b145fccb/raw/5c1b445686bd014ea3a1f0336433e3b364779766/df.txt", delim = " ", col_types = cols())

df$DeviceTime <- as.POSIXct(paste(df$Date, df$Time), format="%Y-%m-%d %H:%M:%S")

dt <- seq(
  from = as.POSIXct("2019-02-12 13:24:00", tz = "GMT"),
  to = as.POSIXct("2019-02-12 14:00:00", tz = "GMT"),
  by = "10 sec"
)

df %>% 
  mutate(BinnedTime = cut(DeviceTime, breaks=dt)) %>%
  group_by(Sensor)%>%
  group_by(BinnedTime,add=TRUE)%>%
  summarize(calCO2 = mean(na.omit(calCO2))) -> df2

df2
#> # A tibble: 7,557 x 3
#> # Groups:   Sensor [?]
#>    Sensor BinnedTime          calCO2
#>    <chr>  <fct>                <dbl>
#>  1 A1     2019-02-12 13:24:00    400
#>  2 A1     2019-02-12 13:24:10    401
#>  3 A1     2019-02-12 13:24:20    401
#>  4 A1     2019-02-12 13:24:30    401
#>  5 A1     2019-02-12 13:24:40    401
#>  6 A1     2019-02-12 13:24:50    400
#>  7 A1     2019-02-12 13:25:00    400
#>  8 A1     2019-02-12 13:25:10    398
#>  9 A1     2019-02-12 13:25:20    397
#> 10 A1     2019-02-12 13:25:30    394
#> # ... with 7,547 more rows
Ramiro Magno
  • 3,085
  • 15
  • 30
  • Thank you for clarifying this and for putting the data somewhere it can be found again. – HCAI Feb 26 '19 at 15:41