1
suppressPackageStartupMessages(library(dplyr))
library(gapminder)
library(magrittr)
library(ggplot2)
library(broom)

fits <- gapminder %>% 
  group_by(country) %>%
  do(fit = lm(lifeExp ~ year +pop, .))

result<-fits %>% 
  augment(fit)
result

I want to fit a model that has different data for every row of a data frame. The code here does everything that I want (grouping by country) except the fact that the data for a row has to be sliced with only years before the year in the row. F.e. for row 6 the model should only use years smaller then 1977, that is the year in the year field of that row, (so here the first 5 rows) and grouped by Afghanistan (but that is already ok in the code), row 7 should use only use years smaller then 1982,….. So there will be different data for every model of every row.

    country continent   year    lifeExp pop gdpPercap
1   Afghanistan Asia    1952    28.801  8425333 779.4453
2   Afghanistan Asia    1957    30.332  9240934 820.8530
3   Afghanistan Asia    1962    31.997  10267083    853.1007
4   Afghanistan Asia    1967    34.020  11537966    836.1971
5   Afghanistan Asia    1972    36.088  13079460    739.9811
6   Afghanistan Asia    1977    38.438  14880372    786.1134
7   Afghanistan Asia    1982    39.854  12881816    978.0114
8   Afghanistan Asia    1987    40.822  13867957    852.3959
9   Afghanistan Asia    1992    41.674  16317921    649.3414
10  Afghanistan Asia    1997    41.763  22227415    635.3414
11  Afghanistan Asia    2002    42.129  25268405    726.7341
12  Afghanistan Asia    2007    43.828  31889923    974.5803
gilberke
  • 45
  • 1
  • 7
  • May be you need `gapminder %>% group_by(country) %>% nest %>% head(3) %>% mutate(data = map2(data, c(1977, 1982, 1989), ~ .x %>% filter(year <= .y) %>% mutate(fit = list(lm(lifeExp ~ year + pop, .)))))` (using only first three country) – akrun Feb 10 '19 at 11:56
  • or to apply `augment` `gapminder %>% group_by(country) %>% nest %>% head(3) %>% mutate(data = map2(data, c(1977, 1982, 1989), ~ .x %>% filter(year <= .y) %>% summarise(fit = list(lm(lifeExp ~ year + pop, .))))) %>% mutate(out = map(data, ~ .x %>% augment(fit))) %>% select(-data) %>% unnest` – akrun Feb 10 '19 at 12:00
  • Thank you, but I think this code uses a year for filtering that is different for every country, but I want a different year for every different row. f.e. 1977 is for the 6th record of Afghanistan, 1982 is for the 7th record of Afghanistan The year to filter with is already in the row. – gilberke Feb 10 '19 at 12:25
  • It is not clear to me – akrun Feb 10 '19 at 15:03
  • What about for row 1 or row2 ? – akrun Feb 10 '19 at 15:23
  • Sorry that is not very clear, but the first rows of a country will have no data or bad models. If we look at Afghanistan: for the first row -> no data because there are no data before 1952, so there will be no result | second row -> one row with a year before 1957, so the model will be build with one row of data, also a result that is of no use | 3rd row -> model will be build with 2 rows of data | ... 12th row -> model will be build with 11 rows of data I'm not interested in the results of the models of the first 5 rows of a country, not enough data to give a good result. – gilberke Feb 10 '19 at 15:41

0 Answers0