0

I am trying to interpolate/extrapolate NA values. The dataset that I have is from a measuring station that measures soil temperature at 4 depths every 5 minutes. In this specific example there are erroneous data (-888.88) at the end of the measurements for the 0 cm depth variable and 1-5 cm depth variable. I transformed this to NA. Now my professor wants me interpolate/extrapolate for this and all other datasets that I have. I am aware that extrapolating for so much values after the last observation could be statistically inaccurate but I am trying to at least come up with a working code. As of now I tried to extrapolate for one of the variables (SoilTemp_1.5cm). The final line runs but when I open the data frame, the NAs are still there.

library(dplyr)
library(Hmisc)

MyD <- read.csv("2319538_Bodentemp_braun_WILDKOGEL_17_18 - Copy.csv",header=TRUE, sep=";")

MyD$date <- as.Date(MyD$Date, "%d.%m.%Y")
MyD$datetime <- as.POSIXct(MyD$Date.Time..GMT.01.00, format = "%d.%m.%Y %H:%M")

MyD[,-c(1,2,3,4,9)][MyD[,-c(1,2,3,4,9)] == -888.88] <- NA #convert erroneous data to NA


MyD %>%  mutate(`SoilTemp_1.5cm`=approxExtrap(x=SoilTemp_5cm, y=SoilTemp_1.5cm, xout=SoilTemp_5cm)$y)

I also tried this way which gives me a list of 2 which has a lot of columns instead of rows when I convert to data frame. I am not going to lie that this approxExtrap syntax confuses me a little bit.

MyD1 <- approxExtrap(MyD$SoilTemp_5cm, MyD$SoilTemp_1.5cm,xout=MyD$SoilTemp_5cm)
MyD1

I am honestly not sure how to reproduce the data so here is pastebin link of a dput() output https://pastebin.com/NFZdmm4L. I tried to include as much output as I could. Have in mind that I excluded some of the columns when running the dput() so the code MyD[,-c(1,2,3,4,9)][MyD[,-c(1,2,3,4,9)] == -888.88] might differ. Anyways, the dput() output already has the NAs included so you might not even need it.

Thanks in advance.

Best regards,

Zorin

1 Answers1

1

na.approx will fill in NAs with interpolated values and rule=2 will extend the first and last values.

library(zoo)

x <- c(NA, 4, NA, 5, NA) # test input

na.approx(x, rule = 2)
## [1] 4.0 4.0 4.5 5.0 5.0
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • I run the following code and when I open the data frame again nothing has happened. `library(zoo) MyD <- read.csv("2319538_Bodentemp_braun_WILDKOGEL_17_18 - Copy.csv",header=TRUE, sep=";") MyD$date <- as.Date(MyD$Date, "%d.%m.%Y") MyD$datetime <- as.POSIXct(MyD$Date.Time..GMT.01.00, format = "%d.%m.%Y %H:%M") MyD[,-c(1,2,3,4,9)][MyD[,-c(1,2,3,4,9)] == -888.88] <- NA na.approx(MyD$SoilTemp_1.5cm, rule=2) ` Also is there a way to base the interpolation/extrapolation of "SoilTemp_1.5cm" variable on the "SoilTemp_5cm" variable? The "SoilTemp_5cm" variable doesn't have missing NAs. – Zorin Ivanov Jun 16 '20 at 09:51
  • Please read the instructions for posting at the top of the [tag:r] tag page. In particular a minimal version of the input sufficient to illustratte your problem should be provided using dput so that it is reproducible. – G. Grothendieck Jun 16 '20 at 12:36
  • But I have provided a pastebin link with the dput() output. If you haven't seen it please look at my initial post. If you have seen it and the reason you are writing this is because the dput() output is too large for your liking, I will reduce it. I thought it is a good idea to provide as large as possible sample size so that extrapolation result is as close as possible to the one of the real dataset. – Zorin Ivanov Jun 16 '20 at 13:40
  • The question should be self contained and reduced to a minimal form so that responders can quickly answer it. – G. Grothendieck Jun 16 '20 at 13:47