0

I have a time series as below:

**Date_time**
2018-06-26 17:19:30
2018-06-26 17:20:40
2018-06-26 17:20:41
2018-06-26 17:20:42
[...]
2018-06-26 17:21:36
2018-06-26 17:21:37
2018-06-26 17:21:38
2018-06-26 17:21:39
2018-06-26 17:23:15

I would like to subsample it such as I obtained the following time series (i.e. removing locations recorded every second such as to keep only 1 location / minute roughly)

**Date_time**
2018-06-26 17:19:30
2018-06-26 17:20:40
2018-06-26 17:21:39
2018-06-26 17:23:15

I wrote the following code (but I do not get the expected time series)

tab_subsampled <- tab %>%
   mutate(Date_Time = ymd_hms(Date_Time), 
          year = year(Date_Time), month = month(Date_Time), day = day(Date_Time), 
          hour = hour(Date_Time), minute = minute(Date_Time), second = second(Date_Time)) %>% 
   group_by(year, month, day, hour, minute) %>%
   slice(n()) %>% 
   ungroup() 

I'd really appreciate some help, thank you very much!

Jujulie
  • 3
  • 3

2 Answers2

1

Simply sample_n will also do

library(lubridate)

time<-c("2018-06-26 17:19:30",
        "2018-06-26 17:20:40",
        "2018-06-26 17:20:41",
        "2018-06-26 18:20:42",
        "2018-06-26 17:21:39",
        "2018-06-26 17:23:15",
        "2018-07-26 17:20:30",
        "2018-07-26 17:20:40",
        "2018-08-26 18:20:41",
        "2018-08-26 18:20:42",
        "2018-09-26 17:21:39",
        "2018-09-26 17:21:15")

time<-as.data.frame(time)
                  time
1  2018-06-26 17:19:30
2  2018-06-26 17:20:40
3  2018-06-26 17:20:41
4  2018-06-26 18:20:42
5  2018-06-26 17:21:39
6  2018-06-26 17:23:15
7  2018-07-26 17:20:30
8  2018-07-26 17:20:40
9  2018-08-26 18:20:41
10 2018-08-26 18:20:42
11 2018-09-26 17:21:39
12 2018-09-26 17:21:15


set.seed(1)
time %>% group_by(date(time), hour(time), minute(time)) %>%
  sample_n(1) %>% ungroup() %>%
  select(time)
# A tibble: 8 x 1
  time               
  <chr>              
1 2018-06-26 17:19:30
2 2018-06-26 17:20:41
3 2018-06-26 17:21:39
4 2018-06-26 17:23:15
5 2018-06-26 18:20:42
6 2018-07-26 17:20:30
7 2018-08-26 18:20:41
8 2018-09-26 17:21:39

Note, you have to added your other ID/grouping variables in group_by statements to do it along those groups.

AnilGoyal
  • 25,297
  • 4
  • 27
  • 45
  • Thanks for your answer. Nevertheless, it does not work as I would like. 2018-06-26 17:19:30 2018-06-26 17:20:40 2018-06-26 17:20:41 2018-06-26 17:20:42 [...] 2018-06-26 17:21:39 2018-06-26 17:23:15 – Jujulie Apr 30 '21 at 14:58
  • Thanks for your answer. Nevertheless, it gives the same results as my code. Actually, I would like to keep one location per minute but also that the time interval between the successive locations is as close as possible to 60 seconds. I added additional rows in my example so that you could better understand what I mean, hopefully ('17:21:36'; 17:21:37';'17:21:38'; 17:21:39). Using the full time series from 17:20:41 to 17:21:39 with locations recorded every seconds, your code selects at random a location per minute, not with an interval as close as possible to 60 seconds. – Jujulie Apr 30 '21 at 15:27
  • See revised answer. Note, you have to added your other ID/grouping variables in group_by statements to do it along those groups. – AnilGoyal Apr 30 '21 at 16:24
0

You can use substr with dplyr on the whole df. Then you can cut of everything after the minutes and then only allow unique values so you only have one data point per minute.

library(dplyr)

#Date_time
time<-c("2018-06-26 17:19:30",
        "2018-06-26 17:20:40",
        "2018-06-26 17:20:41",
        "2018-06-26 17:20:42",
        "2018-06-26 17:21:39",
        "2018-06-26 17:23:15")

time<-as.data.frame(time)
colnames(time) = ("Date_time")

time<-time %>%
  mutate(Date_time = substr(Date_time, 1, 13))

Date.Time_only_minutes<-unique(time$Date_time);Date.Time_only_minutes
Bananarama
  • 28
  • 5
  • Thanks for your answer. Nevertheless, it gives the same results as my code. Actually, I would like to keep one location per minute but also that the time interval between the successive locations is as close as possible to 60 seconds. I added additional rows in my example so that you could better understand what I mean, hopefully ('17:21:36'; 17:21:37';'17:21:38'; 17:21:39). – Jujulie Apr 30 '21 at 15:28