2

I am trying to add a set of extrapolated "observations" to a matrix in R. I know how to do this using normal programming techniques (read; bunch of nested loops and functions) but I feel this must be possible in a much more clean way by using build in R-functionality.

The code below illustrates the point, and where it breaks down

Many thanks in advance for your help!

With kind regards

Sylvain

library(dplyr)

# The idea is that i have a table of observations for e.g. x=5, 6, 7, 8, 9 and 10. The observations (in this example 2)
# conform fairly decently to sets of 2nd order polynomials.
# Now, I want to add an extrapolated value to this table (e.g. x=4). I know how to do this programmically 
# but I feel there must be a cleaner solution to do this. 

#generate dummy data table
x <- 5:10
myData <- tibble(x, a = x^2 * 2 + x * 3 + 4 + rnorm(1,0,0.01), b = x^2 * 3 + x * 4 + 5 + rnorm(1,0,0.01)   )

#Gather (put in Data-Key format)
myDataKeyFormat <- gather(myData,key = "someLabel", value = "myObservation", -x)
fitted_models <- myDataKeyFormat %>% group_by(someLabel) %>% do(model = lm(myObservation ~ poly(x,2), data = .))
myExtrapolatedDataPointx <- tibble(x = 4)

#Add the x=4 field
fitted_points <- fitted_models %>% group_by(someLabel) %>% do(predict(.$model,myExtrapolatedDataPointx)) #R really doesnt like this bit

#append the fitted_points to the myDataKeyFormat
myDataKeyFormatWithExtrapolation <- union(myDataKeyFormat,fitted_points)

#use spread to 
myDataWithExtrapolation <- myDataKeyFormatWithExtrapolation %>% spread(someLabel,myObservation)
Sylvain
  • 47
  • 5

1 Answers1

1

Here is a solution in the tidyverse, and using purrr to create the different models. The idea is to nest (using tidyr::nest) and then purrr::map to train the model. I will then add new values and compute the predictions using modelr::add_predictions. Here you have all the data in the same place : training data, models, testing data and prediction, by your variable someLabel. I also give you a way to visualise the data. You can check R for Data Science by Hadley Wickham & Garrett Grolemund, and especially the part about models for more information.

library(dplyr)
library(tibble)
library(tidyr)
library(purrr)
library(modelr)
library(ggplot2)

set.seed(1) # For reproducibility
x <- 5:10
myData <- tibble(x, 
                 a = x^2 * 2 + x * 3 + 4 + rnorm(1,0,0.01), 
                 b = x^2 * 3 + x * 4 + 5 + rnorm(1,0,0.01))

#Gather (put in Data-Key format)
myDataKeyFormat <- gather(myData,key = "someLabel", value = "myObservation", -x)

myModels <- myDataKeyFormat %>% 
  nest(-someLabel) %>% 
  mutate(model = map(data, ~lm(myObservation ~ poly(x,2), data = .x)))

Here is the result at this point : you have a model for each value of someLabel.

# A tibble: 2 × 3
  someLabel             data    model
      <chr>           <list>   <list>
1         a <tibble [6 × 2]> <S3: lm>
2         b <tibble [6 × 2]> <S3: lm>

I'll add some data points in a new column (map is to create it as a tibble for each line of the data frame).

# New data
new_data <- myModels %>% 
  mutate(new = map(data, ~tibble(x = c(3, 4, 11, 12))))

I add the predictions: add_predictions take a data frame and a model as argument, so I use map2 to map over the new data and the models.

fitted_models <- new_data %>% 
  mutate(new = map2(new, model, ~add_predictions(.x, .y)))
fitted_models
# A tibble: 2 × 4
  someLabel             data    model              new
      <chr>           <list>   <list>           <list>
1         a <tibble [6 × 2]> <S3: lm> <tibble [4 × 2]>
2         b <tibble [6 × 2]> <S3: lm> <tibble [4 × 2]>

There you go: you have for each label the data and model trained on this data, and the new data with predictions. In order to plot it, I use unnest to take the data back to the data frame, and I bind the rows to have the "old" data and the new values together.

my_points <- bind_rows(unnest(fitted_models, data),
          unnest(fitted_models, new))

ggplot(my_points)+
  geom_point(aes(x = x, y = myObservation), color = "black") +
  geom_point(aes(x = x, y = pred), color = "red")+
  facet_wrap(~someLabel)

Models

FlorianGD
  • 2,336
  • 1
  • 15
  • 32