Fill in groups of NAs by interpolation between known values in R

Question

I am struggling on how to fill some NAs in a hourly temperature vector.

Over 21885 instances I have 472 NAs distributed randomly. The NAs should be filled in a logical way regarding the shape of the curve of Temperature throughout the day.

They are distributed in groups. There are 1 isolated, groups of 2, 3, 4 or more NAs in a row. If the group is small I would take the previous or the following values but when the group is large this won't work.

I think I an interpolation between the last known value and the following one is ideal but I have no clue how to do this as I am kind of new to R.

Thank you in advance for your time, any advice into what function or approach to this problem will be very much appreciated.

Sample:

enter image description here

    mydate <- c("2017-03-23 09:00:00 CET","2017-03-23 10:00:00 CET", "2017-03-23 11:00:00 CET" ,"2017-03-23 12:00:00 CET" ,"2017-03-23 13:00:00 CET" ,"2017-03-23 14:00:00 CET" ,"2017-03-23 15:00:00 CET", "2017-03-23 16:00:00 CET",
            "2017-03-23 17:00:00 CET", "2017-03-23 18:00:00 CET", "2017-03-23 19:00:00 CET" ,"2017-03-23 20:00:00 CET" ,"2017-03-23 21:00:00 CET" ,"2017-03-23 22:00:00 CET", "2017-03-23 23:00:00 CET" ,"2017-03-24 00:00:00 CET",
            "2017-03-24 01:00:00 CET", "2017-03-24 02:00:00 CET" ,"2017-03-24 03:00:00 CET" ,"2017-03-24 04:00:00 CET")
mytemp <- c(12, 13, 13, 15, 16, 15, NA, NA, NA, NA ,NA, NA, NA, NA, NA, NA, 10, 10,  9,  9)

mydataframe <- as.data.frame(cbind(mydate, mytemp))

CSV with all instances: https://wetransfer.com/downloads/a1806d8b04013e3ea4acee9bff746b1d20170803073703/8e6e4c

Ape · Accepted Answer · 2017-08-03T11:00:13.737

This function from the zoo package seems to do the job:

zoo::na.fill(mytemp, fill = "extend")

[1] 12.00000 13.00000 13.00000 15.00000 16.00000 15.00000 14.54545
[8] 14.09091 13.63636 13.18182 12.72727 12.27273 11.81818 11.36364
[15] 10.90909 10.45455 10.00000 10.00000  9.00000  9.00000

Edit: this question and it's answer deal with a more general situation where the time points aren't equidistant, using zoo::na.approx. One difference is that na.approx does not extend to the leading and trailing NAs, while na.fill does (when fill = "extend").

Fill in groups of NAs by interpolation between known values in R

1 Answers1