0

I am having issues using the Pad function (Padr) to fill in gaps within a time series. I have some code that downloads hourly data from a server, one day at a time for a specific time period. After each day of data has been downloaded the aim is to use pad to clear up the data and add in the time and date so it can be appropriately combined without an error.

The function downloads the data as a list and looks like the following:

 time                  temperature
2019-11-11 00:00:00          3
2019-11-11 01:00:00          4 
2019-11-11 03:00:00          5

Would like a program to automatically fill in to look like below:

 time                  temperature
2019-11-11 00:00:00          3
2019-11-11 01:00:00          4 
2019-11-11 02:00:00          NA
2019-11-11 03:00:00          5

I have used PAD in the code below to fill in the gaps, but if the data starts at 02:00:00, it starts at that timestep. When using the start_val and end_val it seems to have problems recognising the date and time, any help would be appreciated. I have tried a lot of work arounds but no luck. Baring in mind the date will be different and there is no way of knowing which hour is missing.

    if (nrow(daily$hourly) < 24) {
    daily$hourly <- daily$hourly %>% pad(daily$hourly$time, start_val = as.POSIXct('00:00:00'),end_val = as.POSIXct('23:00:00') %>% fill_by_value(value)
  }

**Update

I think the main issue is that R is not recognising that 00:00:00 is the start of a time series so it will not fill in 01:00:00 as a gap. Both solutions have worked if the gap was in a different place. ANy thoughts. See structure below.

structure(list(time = structure(c(1521936000, 1521939600, 1521943200, 
1521946800, 1521950400, 1521954000, 1521957600, 1521961200, 1521964800, 
1521968400, 1521972000, 1521975600, 1521979200, 1521982800, 1521986400, 
1521990000, 1521993600, 1521997200, 1522000800, 1522004400, 1522008000, 
1522011600, 1522015200), class = c("POSIXct", "POSIXt"), tzone = ""), 
    summary = c("Overcast", "Overcast", "Overcast", "Overcast", 
    "Overcast", "Overcast", "Overcast", "Foggy", "Mostly Cloudy", 
    "Mostly Cloudy", "Overcast", "Mostly Cloudy", "Mostly Cloudy", 
    "Mostly Cloudy", "Mostly Cloudy", "Mostly Cloudy", "Partly Cloudy", 
    "Partly Cloudy", "Partly Cloudy", "Partly Cloudy", "Partly Cloudy", 
    "Clear", "Clear"), icon = c("cloudy", "cloudy", "cloudy", 
    "cloudy", "cloudy", "cloudy", "cloudy", "fog", "partly-cloudy-day", 
    "partly-cloudy-day", "cloudy", "partly-cloudy-day", "partly-cloudy-day", 
    "partly-cloudy-day", "partly-cloudy-day", "partly-cloudy-day", 
    "partly-cloudy-day", "partly-cloudy-day", "partly-cloudy-day", 
    "partly-cloudy-night", "partly-cloudy-night", "clear-night", 
    "clear-night"), precipIntensity = c(0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L), precipProbability = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L), temperature = c(7.28, 7.3, 7.21, 7.08, 7.03, 7.02, 7.15, 
    7.19, 7.38, 7.83, 8.43, 9.35, 9.89, 10.54, 10.81, 11.07, 
    11.55, 11.31, 10.52, 9.67, 8.67, 7.94, 6.93), apparentTemperature = c(7.28, 
    7.3, 7.21, 7.08, 7.03, 7.02, 7.15, 7.19, 7.38, 7.33, 8.43, 
    9.35, 9.64, 10.54, 10.81, 11.07, 11.55, 11.31, 10.52, 9.67, 
    8.67, 7.94, 6.93), dewPoint = c(4.99, 5.07, 5.03, 4.99, 4.86, 
    5.04, 5.41, 5.6, 5.55, 5.62, 5.57, 5.79, 5.84, 5.7, 5.4, 
    5.08, 4.4, 4.2, 4.37, 4.32, 4.02, 4.06, 3.73), humidity = c(0.85, 
    0.86, 0.86, 0.87, 0.86, 0.87, 0.89, 0.9, 0.88, 0.86, 0.82, 
    0.78, 0.76, 0.72, 0.69, 0.67, 0.61, 0.62, 0.66, 0.69, 0.73, 
    0.76, 0.8), pressure = c(1005.4, 1005.7, 1006, 1006.4, 1006.7, 
    1007.2, 1007.7, 1008.6, 1009.4, 1010.3, 1010.9, 1011.6, 1011.7, 
    1012.1, 1012.2, 1012.3, 1012.4, 1012.6, 1013.3, 1013.8, 1014.5, 
    1014.8, 1015.3), windSpeed = c(0.35, 0.48, 0.55, 0.33, 0.36, 
    0.6, 0.85, 1.05, 1.29, 1.38, 0.89, 1.33, 1.39, 1.44, 1.63, 
    1.57, 1.46, 1.27, 0.57, 0.23, 0.03, 0.27, 0.2), windGust = c(0.48, 
    0.81, 0.95, 0.42, 0.44, 0.96, 1.14, 1.28, 2.03, 1.99, 1.72, 
    2.51, 2.48, 2.66, 2.48, 2.46, 2.42, 1.67, 0.65, 0.27, 0.03, 
    0.27, 0.2), windBearing = c(28L, 6L, 12L, 1L, 12L, 3L, 12L, 
    23L, 40L, 41L, 26L, 22L, 15L, 21L, 9L, 11L, 10L, 18L, 16L, 
    17L, NA, 273L, 284L), cloudCover = c(0.98, 0.98, 0.98, 0.93, 
    0.89, 0.93, 0.97, 0.94, 0.82, 0.83, 0.99, 0.75, 0.75, 0.75, 
    0.75, 0.73, 0.51, 0.49, 0.46, 0.46, 0.44, 0.1, 0), uvIndex = c(0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 2L, 3L, 3L, 3L, 3L, 2L, 
    1L, 0L, 0L, 0L, 0L, 0L, 0L), visibility = c(6.74, 6.064, 
    6.532, 6.035, 6.054, 6.006, 4.033, 3.047, 4.369, 5.512, 6.856, 
    8.129, 9.269, 9.488, 10.003, 10.003, 10.003, 10.003, 10.003, 
    10.003, 10.003, 10.003, 9.521)), row.names = c(NA, -23L), class = "data.frame")
adamR
  • 25
  • 4

2 Answers2

1

You can use complete from tidyr and create an hourly sequence between min and max time

tidyr::complete(df, time = seq(min(time), max(time), by = "1 hour"))

#  time                temperature
#  <dttm>                    <int>
#1 2019-11-11 00:00:00           3
#2 2019-11-11 01:00:00           4
#3 2019-11-11 02:00:00          NA
#4 2019-11-11 03:00:00           5

data

df <- structure(list(time = structure(c(1573401600, 1573405200, 1573412400
), class = c("POSIXct", "POSIXt"), tzone = ""), temperature = 3:5), 
row.names = c(NA, -3L), class = "data.frame")
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thanks Ronak, good way to solve my problem and easy then using pad. I think I am still having the same problem as with the pad function. In that it doesnt recognise 00:00:00 as the start. So if the gap is between 00:00:00 and 02:00:00 it wont add 01:00:00. Any thoughts? – adamR Nov 10 '19 at 21:31
  • @adamR Do you have `00:00:00` as value in the column ? Does it have only time or datetime value as I have shown in my example ? Lastly, it would be helpful if you could share your actual data using `dput(df)` so that it is easier to help. – Ronak Shah Nov 10 '19 at 23:45
  • Thanks for the response, it is in datetime value as your example. I will dput a little later. Thanks again. – adamR Nov 11 '19 at 09:02
1

padr::pad takes dataframes as its first argument, so it does not work on the vector you are giving it now. All you need to do is:

x <- data.frame(
  time = as.POSIXct(c('2019-11-11 00:00:00','2019-11-11 01:00:00','2019-11-11 03:00:00')),
  temperature = 3:5
)
padr::pad(x)
Edwin
  • 3,184
  • 1
  • 23
  • 25
  • Thanks Edwin, I have updates the question as neither your nor Ronaks solutions seems to work with filling in 01:00:00. I cant get it to work if its in a dataframe or list. – adamR Nov 13 '19 at 19:45
  • It works fine, since this is the night the clock moves from CET (winter time) to CEST (summer time). For this particular night there is no 02:00:00, as it skips to 03:00:00 at that moment. See `head(x$time)` where x is the data you outputted in your edit. – Edwin Nov 15 '19 at 08:37
  • Thanks Edwin, that never occurred to me, will look into that problems. Thanks – adamR Nov 18 '19 at 09:21