0

I have a large hourly time series data set showing temperatures at different times. There were a number of missing values (NA) in the series so I used linear interpolation to impute the missing values using the imputeTS package. Before the interpolation I was told to create a column for the imputed values as a zoo object. This replaced any NA temperatures with imputed ones.

I am doing heating degree day analysis which is the heating required to heat a building to room temperature. If the outside temperature is below 15.5 degrees then heating is required. I am looking to ignore (or set to NA) values above 15.5 and only focus on the temperatures below. I then would like to calculate the heating degree days which would be (15.5-Temp)*1/24 (24 hours in a day). This is usually simple however I am having trouble with the zoo object. Can anyone help??

An example of the data is:

DateTimes <- as.POSIXct(c("2009-01-01 00:00:00", "2009-01-01 01:00:00", "2009-01-01 02:00:00", "2009-01-01 03:00:00", "2009-01-01 04:00:00", "2009-01-01 05:00:00", "2009-01-01 06:00:00"))
MeanTemp <- c(0.8, 0.7, 0.7, NA, 0.8, 0.9, 1.1)

HourTemp <- data.frame(DateTimes, MeanTemp) 

These are my imputation steps:

#Use linear interpolation to impute missing values
TempImp <- zoo(HourTemp$MeanTemp, HourTemp$DateTimes)
TempImp <- imputeTS::na.interpolation(TempImp, option = "linear")
#Add imputed values to data
as.data.frame(HourTemp)
HourTemp$airTempImp <- round(TempImp,1)
#Add imputed flag
HourTemp$Imputed <- ifelse(is.na(HourTemp$MeanTemp), "Imputed", "Observed")
HourTemp

The imputations worked successfully, replacing NA values with estimates but I cannot manipulate the zoo object 'airTempImp' to create a heating degree days column as specified in the opening paragraph.

I have tried using ifelse, ifelse.zoo, transform but none seem to be working!

Thanks!

EllisR8
  • 169
  • 2
  • 10
  • @RonakShah I have edited the post, hope this helps! – EllisR8 Sep 18 '19 at 10:21
  • @RonakShah Hi sorry, I explained it in the opening paragraph. I want to set temperatures above 15.5 to NA and then convert the temperatures into heating degree days as stated using the formula shown. – EllisR8 Sep 18 '19 at 10:41
  • (1) The `DateTimes <-` line has a syntax error. Please fix. (2) If you are using zoo anyways then you can use `na.approx` from that package.. No need for other packages. (3) The code shown could be better written as `transform(HourTemp, airTempImp = round(na.approx(MeanTemp, na.rm = FALSE), 1), Imputed = ifelse(is.na(MeanTemp), "Imputed", "Observed"))` (4) exactly what is the code that you are having problem with??? – G. Grothendieck Sep 18 '19 at 12:25
  • @G.Grothendieck (1) Sorry about the syntax error in the date times, I already had a column for datetime so just used that to show what it should be. (2)/(3) Okay that looks better! (4) I am having trouble converting the airTempImp into Heating degree days as specified. I am able to do so as a character but am having problems as a zoo object. – EllisR8 Sep 18 '19 at 12:41

2 Answers2

0

It sounds like you haven't converted the zoo object to a more generic R object (but you haven't given an error message or code that produces it, so I can't be 100% sure).

In that case, you can use the as.vector function (see https://www.rdocumentation.org/packages/zoo/versions/1.8-6/topics/as.zoo), to convert a zoo object into a vector, which you can add to a data.frame.

The example code below removes imputeTS, like what G. Grothendieck says in his comment, since zoo's na.approx does linear interpolation.

# install.packages("zoo")
library("zoo")

DateTimes <- as.POSIXct(c(
  "2009-01-01 00:00:00", "2009-01-01 01:00:00",
  "2009-01-01 02:00:00", "2009-01-01 03:00:00",
  "2009-01-01 04:00:00", "2009-01-01 05:00:00", "2009-01-01 06:00:00"))
MeanTemp <- c(0.8, 0.7, 0.7, NA, 0.8, 0.9, 1.1)
HourTemp <- data.frame(DateTimes, MeanTemp)
TempImp <- zoo(HourTemp$MeanTemp, HourTemp$DateTimes)

# use zoo's linear interpolation
HourTemp$airTempImp <- as.vector(na.approx(TempImp))
HourTemp$Imputed <- ifelse(is.na(HourTemp$MeanTemp), "Imputed", "Observed")

# calculates the heating degree day per hour if temp > 15.5,
# else sets to 0 (no heating)
HourTemp$HeatingDegreeDay <- ifelse(
  HourTemp$airTempImp > 15.5,
  0, # no heating
  (15.5 - HourTemp$airTempImp) / 24
)

which will output:

HourTemp
            DateTimes MeanTemp airTempImp  Imputed HeatingDegreeDay
1 2009-01-01 00:00:00      0.8       0.80 Observed        0.6125000
2 2009-01-01 01:00:00      0.7       0.70 Observed        0.6166667
3 2009-01-01 02:00:00      0.7       0.70 Observed        0.6166667
4 2009-01-01 03:00:00       NA       0.75  Imputed        0.6145833
5 2009-01-01 04:00:00      0.8       0.80 Observed        0.6125000
6 2009-01-01 05:00:00      0.9       0.90 Observed        0.6083333
7 2009-01-01 06:00:00      1.1       1.10 Observed        0.6000000
Alois Klink
  • 646
  • 8
  • 9
  • Thanks!! This sorted out the problem perfectly! I had to include na.rm = FALSE in the na.approx. – EllisR8 Sep 18 '19 at 14:16
  • `na.approx` works directly on ordinary vectors as illustrated in the code in my comment under the question thus there is no need to convert to zoo and back in the first place. – G. Grothendieck Sep 18 '19 at 15:23
0

Your solution is too complicated - since you anyway seem to want to have a data.frame you do not need to convert your data to a zoo object.

Just apply na_interpolation from imputeTS directly on the data.frame (imputeTS can deal with all kinds of inputs e.g. data.frame, vector, zoo, ts, xts, tibble, tsibble)

It's just:

library(imputeTS)
DateTimes <- as.POSIXct(c("2009-01-01 00:00:00", "2009-01-01 01:00:00", 
  "2009-01-01 02:00:00", "2009-01-01 03:00:00", "2009-01-01 04:00:00",
  "2009-01-01 05:00:00", "2009-01-01 06:00:00"))

MeanTemp <- c(0.8, 0.7, 0.7, NA, 0.8, 0.9, 1.1)
HourTemp <- data.frame(DateTimes, MeanTemp)

Imputed <- imputeTS::na.interpolation(HourTemp, option = "linear")

imputeTS will just ignore the date column in this case and fill the data column

Steffen Moritz
  • 7,277
  • 11
  • 36
  • 55