I am struggling on how to fill some NAs in a hourly temperature vector.
Over 21885 instances I have 472 NAs distributed randomly. The NAs should be filled in a logical way regarding the shape of the curve of Temperature throughout the day.
They are distributed in groups. There are 1 isolated, groups of 2, 3, 4 or more NAs in a row. If the group is small I would take the previous or the following values but when the group is large this won't work.
I think I an interpolation between the last known value and the following one is ideal but I have no clue how to do this as I am kind of new to R.
Thank you in advance for your time, any advice into what function or approach to this problem will be very much appreciated.
Sample:
mydate <- c("2017-03-23 09:00:00 CET","2017-03-23 10:00:00 CET", "2017-03-23 11:00:00 CET" ,"2017-03-23 12:00:00 CET" ,"2017-03-23 13:00:00 CET" ,"2017-03-23 14:00:00 CET" ,"2017-03-23 15:00:00 CET", "2017-03-23 16:00:00 CET",
"2017-03-23 17:00:00 CET", "2017-03-23 18:00:00 CET", "2017-03-23 19:00:00 CET" ,"2017-03-23 20:00:00 CET" ,"2017-03-23 21:00:00 CET" ,"2017-03-23 22:00:00 CET", "2017-03-23 23:00:00 CET" ,"2017-03-24 00:00:00 CET",
"2017-03-24 01:00:00 CET", "2017-03-24 02:00:00 CET" ,"2017-03-24 03:00:00 CET" ,"2017-03-24 04:00:00 CET")
mytemp <- c(12, 13, 13, 15, 16, 15, NA, NA, NA, NA ,NA, NA, NA, NA, NA, NA, 10, 10, 9, 9)
mydataframe <- as.data.frame(cbind(mydate, mytemp))
CSV with all instances: https://wetransfer.com/downloads/a1806d8b04013e3ea4acee9bff746b1d20170803073703/8e6e4c