1

I have a data.table which contains date and timestamps for start and end date. I know I can calculate the time difference by using the difftime function. But I want to calculate the time difference and split it into individual hours in each row along with the copy of the rest of the data. for example:

    Start_time                         End_time 

       1: 2017-01-01 00:15:00 2017-01-01 00:23:00   
       2: 2017-01-01 00:24:00 2017-02-01 00:10:00   
       3: 2017-01-01 23:38:00 2017-02-01 00:45:00  
       4: 2017-01-01 14:38:00 2017-01-01 07:43:00    
       5: 2017-01-01 06:38:00 2017-01-01 00:43:00  

i did the following to extract the start hour and end hour

    Start_time                         End_time             start_hour    end hour

   1: 2017-01-01 00:15:00 2017-01-01 00:23:00         0            0
   2: 2017-01-01 00:24:00 2017-02-01 00:10:00         0            0
   3: 2017-01-01 23:38:00 2017-02-01 00:45:00         23           0
   4: 2017-01-01 14:38:00 2017-01-01 07:43:00         14           7
   5: 2017-01-01 06:38:00 2017-01-01 00:43:00          6           0
now, i have to find all the temperatures between start_hour and end_hour inclusive and populate my table 
i have a weather report which is collected hourly with corresponding temperature, 
 **DateTime**               **Temperature**      hour
1  2017-01-01 06:00:00          16                6
2  2017-01-01 05:00:00          16                5
3  2017-01-01 04:00:00          16                4
4  2017-01-01 08:00:00           9                8
5  2017-01-01 03:00:00           5                3           
6  2017-01-01 00:00:00           8                0
7  2017-01-01 07:00:00           5                7

i have to get all the temperatures between the starttime and end time (inclusive)

ie. if a ride is for 4 hours on a day, 2017-01-01 00:05:00 till 2017-01-01 
04:00:00 i need 4 rows with 4 different temperatures 00:00:00 , 01:00:00, 02:00:00, 03:00:00,04:00:00 . 
date should also be considered here ex: 2017-01-01 23:45:00 till 2017-02-01 14:45:00 ,
 i need 15 rows for corresponding temperature.
 i am struggling with issue and couldnot find a solution,kindly help
prathyusha
  • 11
  • 4
  • Try to use ceiling instead of round – Pawel Stradowski Aug 26 '19 at 09:35
  • thank yu, i usd ceiling function also, but my problem is not completely about the time alone. i also want to split the difference in hours into each block of individual rows with the hour , any idea about hpow to do it? – prathyusha Aug 26 '19 at 09:54

1 Answers1

1

I made a reprex from your data and used data.table for non-equi joins. Is this your expected result?

library(data.table)

rides <- as.data.table(structure(list(Trip_id = c("17059131", "17059130", "17059129", 
"17059128", "17059127", "17059126", "17059125", "17061379", "17061378", 
"17063646"), DurationInMinutes = c(8, 12, 17, 5, 5, 20, 12, 48, 
47, 25), Start_time = structure(c(1483229700, 1483230240, 1483230480, 
1483231080, 1483231080, 1483231140, 1483231380, 1483232160, 1483232220, 
1483235640), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    End_time = structure(c(1483230180, 1483230960, 1483231500, 
    1483231380, 1483231380, 1483232340, 1483232100, 1483235040, 
    1483235040, 1483237140), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    start_hour = structure(c(1483228800, 1483228800, 1483228800, 
    1483228800, 1483228800, 1483228800, 1483228800, 1483228800, 
    1483228800, 1483232400), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    end_hour = structure(c(1483228800, 1483228800, 1483228800, 
    1483228800, 1483228800, 1483228800, 1483228800, 1483232400, 
    1483232400, 1483236000), class = c("POSIXct", "POSIXt"), tzone = "UTC")), class = "data.frame", row.names = c(NA, 
-10L)))

weather <- as.data.table(structure(list(weather_time = structure(c(1483228800, 1483232400, 
1483236000), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    temperature = c("9.90", "9.67", "9.33")), class = "data.frame", row.names = c(NA, 
-3L)))

rides_with_temp <- rides[weather, on =.(start_hour <= weather_time, end_hour >= weather_time)] 
rides_with_temp

Code with all transformations if you need them:

library(readxl)
library(lubridate)
library(tidyverse)
library(data.table)

bikerides_startandendtimes <- 
  read_excel("C:/Users/pstra/Downloads/bikerides startandendtimes.xlsx") %>% 
  mutate(Start_time = mdy_hms(Start_time),
         End_time = mdy_hms(End_time),
         start_hour = floor_date(Start_time, unit = "hours"),
         end_hour = floor_date(End_time, unit = "hours")
         ) %>%       
  as.data.table()

weatherdata <- 
  read_excel("C:/Users/pstra/Downloads/weatherdata.xlsx") %>% 
  mutate(weather_time = ymd_h(paste(year, Month, Day, Hour, sep = "/"))) %>% 
  select(weather_time, temperature) %>% 
  as.data.table()

bikerides_startandendtimes[weatherdata, on =.(start_hour <= weather_time, end_hour >= weather_time)] %>%

dput(as.data.frame(head(bikerides_startandendtimes, 10)))
dput(as.data.frame(head(weatherdata, 3)))

Regards Paweł

Pawel Stradowski
  • 807
  • 7
  • 13
  • thank you for the above code, but i think i dint expalin my quesion properly, i have calculated the time difference and extracted the end hour and start hour, but the next part of the question is , – prathyusha Aug 26 '19 at 10:57
  • case 1: if the end_date == start_date & end_time == start_time i dont have any problem , i can take the hour. but, case 2:if end_date == start_date & end_time! == start_time then the difference in the hours should be put into different rows : for example : if start time is 2017-01-04 13:20:00 end time is 2017-01-04 16:25:00 he start _ hour = 13 end_hour = 16 now, i need blocks of one hour from each in different rows or columns 13 , 14, 15 , 16 hours – prathyusha Aug 26 '19 at 11:14
  • case 3: if end_date == start_date & end_time! == start_time then again the hours in different row or columns along with date difference example -: if start is 2017-01-04 13:20:00 and end is 2017-02-04 10:20:00 the differen ce is 22 hours so i eed 22 rows from 13 hours from 2017-01 to 10 o clock 2017-02 . any help how to achieve this? – prathyusha Aug 26 '19 at 11:15
  • I modified my post above, is this your expected result? – Pawel Stradowski Aug 26 '19 at 12:51
  • Thanks for your time, i got the following result when i executed the code above, but its not what i am looking for Start_time End_time time_diff slot 1 2017-01-01 00:15:00 2017-01-01 00:23:00 8 mins 2017-01-01 00:00:00 2 2017-01-01 00:24:00 2017-01-01 00:36:00 12 mins 2017-01-01 00:00:00 11 2017-01-01 13:27:00 2017-01-01 15:20:00 113 mins 2017-01-01 13:00:00 – prathyusha Aug 26 '19 at 15:01
  • i will try to edit my question again and post . this is my first time posting something in stackoverflow. – prathyusha Aug 26 '19 at 15:18
  • please add a sample of your data, including temperature. You can do this using dput: dput(head(your_df,10)) – Pawel Stradowski Aug 26 '19 at 16:53
  • the above two links are for the sample data in excels, please have a look. thanks – prathyusha Aug 26 '19 at 18:27
  • There are 2 data sets, one about temperature, second about bike trips. Tell me, what is your goal? An example: I want to assign temperature :of each bike trip. Step back from all analytical stuff and ask a good question :-) – Pawel Stradowski Aug 26 '19 at 21:06
  • Thanks a lot for your understanding. Yes , I have two datasets, I need to take the temperature from weather report into my bike data set for each hour. Each start and end time stamp reference each trip.so , I have to get information about how the temperature is varying from start time to end time during a trip, if a trip duration is 3 hrs ex: 15.45 is start time , and 18.45 is end time on same day , I need temperature for 15:00 , 16:00 ,17:00 , 18:00. If trip starts 23.45 and extends to next day 02.34 in the morning. Then 23:00, 00:00,01:00, 02:00 hrs respective temperatures r needed – prathyusha Aug 26 '19 at 22:48
  • And if the trip is only for 10 mins and in the same hour same day ex: 13:30 start time and 13.40 end time 2017-03-01 , then temperature for 13 .00 on that day. Here my idea of approach was to join the weather data and bike data with the datetime column as key, provided if I could split the difference between start and end timestamps into different individual blocks of hours, in bike data first and then map it with the weather. I hope I made sense in explaining my issue. – prathyusha Aug 26 '19 at 23:05
  • Always provide reproducible example(reprex), this time I made one for you - see updated post – Pawel Stradowski Aug 27 '19 at 13:36