Interpolating
library(tidyverse)
library(data.table)
dplyr
to clarify data.table
solution you want:
df <- as_tibble(iris)
df
#> # A tibble: 150 x 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 5 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa
#> # ... with 140 more rows
We can just mutate()
the fitted values
df %>%
group_by(Species) %>% # for each Species
mutate(
pred = lm(Sepal.Length ~ Sepal.Width)$fitted.values
)
#> # A tibble: 150 x 6
#> # Groups: Species [3]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species pred
#> <dbl> <dbl> <dbl> <dbl> <fct> <dbl>
#> 1 5.1 3.5 1.4 0.2 setosa 5.06
#> 2 4.9 3 1.4 0.2 setosa 4.71
#> 3 4.7 3.2 1.3 0.2 setosa 4.85
#> 4 4.6 3.1 1.5 0.2 setosa 4.78
#> 5 5 3.6 1.4 0.2 setosa 5.12
#> 6 5.4 3.9 1.7 0.4 setosa 5.33
#> 7 4.6 3.4 1.4 0.3 setosa 4.99
#> 8 5 3.4 1.5 0.2 setosa 4.99
#> 9 4.4 2.9 1.4 0.2 setosa 4.64
#> 10 4.9 3.1 1.5 0.1 setosa 4.78
#> # ... with 140 more rows
data.table
For this df
, we can apply same logic.
setDT(df)[, pred := lm(Sepal.Length ~ Sepal.Width)$fitted.values, by = Species]
- define new column
pred
by fitted values
by
each group Species
Then we get the same result:
df
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species pred
#> 1: 5.1 3.5 1.4 0.2 setosa 5.055715
#> 2: 4.9 3.0 1.4 0.2 setosa 4.710470
#> 3: 4.7 3.2 1.3 0.2 setosa 4.848568
#> 4: 4.6 3.1 1.5 0.2 setosa 4.779519
#> 5: 5.0 3.6 1.4 0.2 setosa 5.124764
#> ---
#> 146: 6.7 3.0 5.2 2.3 virginica 6.611440
#> 147: 6.3 2.5 5.0 1.9 virginica 6.160673
#> 148: 6.5 3.0 5.2 2.0 virginica 6.611440
#> 149: 6.2 3.4 5.4 2.3 virginica 6.972054
#> 150: 5.9 3.0 5.1 1.8 virginica 6.611440
Extrapolating
First of all, the colname of newdata
should be set same as the model.
newtemp <- data.frame(Sepal.Width = c(6, 7))
As doing aggregation in data.table
, you might do .(predict(mod, newdata))
:
dt <- as.data.table(df)
dt[, .(pred = predict(lm(Sepal.Length ~ Sepal.Width, data = .SD), newdata = newtemp)), by = Species]
#> Species pred
#> 1: setosa 6.781940
#> 2: setosa 7.472429
#> 3: versicolor 8.730201
#> 4: versicolor 9.595279
#> 5: virginica 9.316043
#> 6: virginica 10.217578
If you want newdata
column for each group, you can just add the term inside the list .()
I implemented %>%
for readability.
df %>%
data.table() %>%
.[,
.(newdata = unlist(newtemp, use.names = FALSE),
pred = predict(lm(Sepal.Length ~ Sepal.Width, data = .SD), newdata = newtemp)),
by = Species]
#> Species newdata pred
#> 1: setosa 6 6.781940
#> 2: setosa 7 7.472429
#> 3: versicolor 6 8.730201
#> 4: versicolor 7 9.595279
#> 5: virginica 6 9.316043
#> 6: virginica 7 10.217578