I am trying to interpolate/extrapolate NA values. The dataset that I have is from a measuring station that measures soil temperature at 4 depths every 5 minutes. In this specific example there are erroneous data (-888.88) at the end of the measurements for the 0 cm depth variable and 1-5 cm depth variable. I transformed this to NA. Now my professor wants me interpolate/extrapolate for this and all other datasets that I have. I am aware that extrapolating for so much values after the last observation could be statistically inaccurate but I am trying to at least come up with a working code. As of now I tried to extrapolate for one of the variables (SoilTemp_1.5cm). The final line runs but when I open the data frame, the NAs are still there.
library(dplyr)
library(Hmisc)
MyD <- read.csv("2319538_Bodentemp_braun_WILDKOGEL_17_18 - Copy.csv",header=TRUE, sep=";")
MyD$date <- as.Date(MyD$Date, "%d.%m.%Y")
MyD$datetime <- as.POSIXct(MyD$Date.Time..GMT.01.00, format = "%d.%m.%Y %H:%M")
MyD[,-c(1,2,3,4,9)][MyD[,-c(1,2,3,4,9)] == -888.88] <- NA #convert erroneous data to NA
MyD %>% mutate(`SoilTemp_1.5cm`=approxExtrap(x=SoilTemp_5cm, y=SoilTemp_1.5cm, xout=SoilTemp_5cm)$y)
I also tried this way which gives me a list of 2 which has a lot of columns instead of rows when I convert to data frame. I am not going to lie that this approxExtrap syntax confuses me a little bit.
MyD1 <- approxExtrap(MyD$SoilTemp_5cm, MyD$SoilTemp_1.5cm,xout=MyD$SoilTemp_5cm)
MyD1
I am honestly not sure how to reproduce the data so here is pastebin link of a dput() output https://pastebin.com/NFZdmm4L. I tried to include as much output as I could. Have in mind that I excluded some of the columns when running the dput() so the code MyD[,-c(1,2,3,4,9)][MyD[,-c(1,2,3,4,9)] == -888.88]
might differ. Anyways, the dput() output already has the NAs included so you might not even need it.
Thanks in advance.
Best regards,
Zorin