0

I have a dataframe with the following columns: electricity consumption E (over 24 hours), hour h and temperature t. I would like to extrapolate the consumption per hour for temperatures where I do not have data.

I have been following eddis's reply from Apply grouped model back onto data

combinedprofiles <- data.table(df)

#Make a model for each hour
my.models <- combined_profiles[, list(Model = list(lm(E ~ t))),
                keyby = h] 

#Make predictions on dataset
setkey(combined_profiles, hour)
combined_profiles[my.models, prediction := predict(i.Model[[1]], .SD), by = .EACHI]

I have tried adding a dataframe with the new temperatures as new data to the prediction.

  newtemp<- data.frame(temp_round=c(6,7))
  combined_profiles[my.models, prediction := predict(newdata=newtemp,i.Model[[1]], .SD), by = .EACHI]

but this gives me the following error: Error in se.fit || interval != "none" : invalid 'x' type in 'x || y'

Could anyone please help me how to change this so as to predict demand for temperatures outside the measured data.

For the iris example my question would be, how to extrapolate Sepal.Length for data where we don't have Sepal.Width.

Thanks!

maaar
  • 1
  • 1

1 Answers1

0

Interpolating

library(tidyverse)
library(data.table)

dplyr to clarify data.table solution you want:

df <- as_tibble(iris)
df
#> # A tibble: 150 x 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # ... with 140 more rows

We can just mutate() the fitted values

df %>%
  group_by(Species) %>% # for each Species
  mutate(
    pred = lm(Sepal.Length ~ Sepal.Width)$fitted.values
  )
#> # A tibble: 150 x 6
#> # Groups:   Species [3]
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species  pred
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>   <dbl>
#>  1          5.1         3.5          1.4         0.2 setosa   5.06
#>  2          4.9         3            1.4         0.2 setosa   4.71
#>  3          4.7         3.2          1.3         0.2 setosa   4.85
#>  4          4.6         3.1          1.5         0.2 setosa   4.78
#>  5          5           3.6          1.4         0.2 setosa   5.12
#>  6          5.4         3.9          1.7         0.4 setosa   5.33
#>  7          4.6         3.4          1.4         0.3 setosa   4.99
#>  8          5           3.4          1.5         0.2 setosa   4.99
#>  9          4.4         2.9          1.4         0.2 setosa   4.64
#> 10          4.9         3.1          1.5         0.1 setosa   4.78
#> # ... with 140 more rows

data.table

For this df, we can apply same logic.

setDT(df)[, pred := lm(Sepal.Length ~ Sepal.Width)$fitted.values, by = Species]
  1. define new column pred by fitted values
  2. by each group Species

Then we get the same result:

df
#>      Sepal.Length Sepal.Width Petal.Length Petal.Width   Species     pred
#>   1:          5.1         3.5          1.4         0.2    setosa 5.055715
#>   2:          4.9         3.0          1.4         0.2    setosa 4.710470
#>   3:          4.7         3.2          1.3         0.2    setosa 4.848568
#>   4:          4.6         3.1          1.5         0.2    setosa 4.779519
#>   5:          5.0         3.6          1.4         0.2    setosa 5.124764
#>  ---                                                                     
#> 146:          6.7         3.0          5.2         2.3 virginica 6.611440
#> 147:          6.3         2.5          5.0         1.9 virginica 6.160673
#> 148:          6.5         3.0          5.2         2.0 virginica 6.611440
#> 149:          6.2         3.4          5.4         2.3 virginica 6.972054
#> 150:          5.9         3.0          5.1         1.8 virginica 6.611440

Extrapolating

First of all, the colname of newdata should be set same as the model.

newtemp <- data.frame(Sepal.Width = c(6, 7))

As doing aggregation in data.table, you might do .(predict(mod, newdata)):

dt <- as.data.table(df)

dt[, .(pred = predict(lm(Sepal.Length ~ Sepal.Width, data = .SD), newdata = newtemp)), by = Species]
#>       Species      pred
#> 1:     setosa  6.781940
#> 2:     setosa  7.472429
#> 3: versicolor  8.730201
#> 4: versicolor  9.595279
#> 5:  virginica  9.316043
#> 6:  virginica 10.217578

If you want newdata column for each group, you can just add the term inside the list .()

I implemented %>% for readability.

df %>%
  data.table() %>%
  .[,
    .(newdata = unlist(newtemp, use.names = FALSE),
      pred = predict(lm(Sepal.Length ~ Sepal.Width, data = .SD), newdata = newtemp)),
    by = Species]
#>       Species newdata      pred
#> 1:     setosa       6  6.781940
#> 2:     setosa       7  7.472429
#> 3: versicolor       6  8.730201
#> 4: versicolor       7  9.595279
#> 5:  virginica       6  9.316043
#> 6:  virginica       7 10.217578
younggeun
  • 923
  • 1
  • 12
  • 19