0

My data set:

structure(list(year = 2010:2019, pop = c(9574323, 9657592, 9749476, 
9843336, 9932887, 10031646, 10154788, 10268233, 10381615, 10488084
), ye = 1:10), row.names = c("1", "2", "3", "4", "5", "6", "7", 
"8", "9", "10"), class = "data.frame")

I only the linear regression of the Year and Pop columns. When I run the summary(lm) for those two columns this is what I get:

> summary(lm(pop~year, data = this))

Call:
lm(formula = pop ~ year, data = this)

Residuals:
 Min       1Q   Median       3Q      Max 
-27821.4 -10094.9    656.5  12968.3  27549.8 

Coefficients:
          Estimate Std. Error t value Pr(>|t|)    
(Intercept) -196556312    4240960  -46.35 5.19e-11 ***
year            102539       2105   48.71 3.49e-11 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 19120 on 8 degrees of freedom
Multiple R-squared:  0.9966,    Adjusted R-squared:  0.9962 
F-statistic:  2372 on 1 and 8 DF,  p-value: 3.493e-11

The slope intercept equation is not correct. But when I run the lm using the ye column, its correct.

summary(lm(pop~ye, data = this))

Call:
lm(formula = pop ~ ye, data = this)

Residuals:
 Min       1Q   Median       3Q      Max 
-27821.4 -10094.9    656.5  12968.3  27549.8 

Coefficients:
        Estimate Std. Error t value Pr(>|t|)    
(Intercept)  9444234      13062  723.00  < 2e-16 ***
ye            102539       2105   48.71 3.49e-11 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 19120 on 8 degrees of freedom
Multiple R-squared:  0.9966,    Adjusted R-squared:  0.9962 
F-statistic:  2372 on 1 and 8 DF,  p-value: 3.493e-11

This isn't what I'm looking for because I want to predict for the years 2020, 2021 and so on. What do I need to change to make the Year column work in the equation? I tried this in excel too, and its the same thing.

paqmo
  • 3,649
  • 1
  • 11
  • 21
DaCrDg
  • 25
  • 3

2 Answers2

1

This answer from Cross Validated covers your question in great detail, but the short answer is the two are equivalent, except for the intercept term.

For interpretability, you might want to set a reference year, then set the regression's year data based on that reference year (e.g. 2010 = reference year 0, 2015 = year 5), much like you've done with the ye column.

The other commenter suggests using predict() to predict years 2020 and 2021, which would work for both methods (either using c(2020, 2021) or c(20, 21) respectively).

eleventhend
  • 296
  • 3
  • 17
0

We can do this with predict.

model <- lm(pop~year, data = this)
predict(model,data.frame(year=c(2020,2021)))
       1        2 
10572162 10674701 
Ian Campbell
  • 23,484
  • 14
  • 36
  • 57