My data set looks as follows:
country year Var1 Var2 Var3 Var4
1 AT 2010 0.27246094 15 0 0
2 BE 2010 0.14729459 53 0 1
3 BG 2010 0.08744856 3 0 0
4 CY 2010 0.15369261 6 0 0
5 CZ 2010 0.20284360 6 0 1
6 DE 2010 0.12541694 37 0 0
7 AT 2011 0.35370741 16 0 0
8 BE 2011 0.14572864 54 0 0
9 BG 2011 0.11929461 4 0 0
10 CY 2011 0.24550898 7 0 1
11 CZ 2011 0.23333333 7 0 0
12 DE 2011 0.21943574 38 0 0
13 AT 2012 0.35073780 17 0 0
14 BE 2012 0.19700000 55 0 0
15 BG 2012 0.08472803 5 0 0
16 CY 2012 0.16949153 8 0 0
17 CZ 2012 0.26914661 8 0 0
18 DE 2012 0.22037422 39 0 0
19 AT 2013 0.34716599 18 0 1
20 BE 2013 0.28906250 56 0 0
21 BG 2013 0.14602216 6 0 1
22 CY 2013 0.44023904 9 0 0
23 CZ 2013 0.35146022 9 0 1
24 DE 2013 0.25500323 40 0 1
It covers 4 years for each of the 6 countries.
What I want to do is run a regression Var2 ~ Var 1.
Since I have multiple years I considered using time series. So, first I changed the year column from character to date:
library(dplyr)
mutate(testdf, year = as.Date(year, format= "%Y"))
Then, I tried to run my regression and received this error:
library(plm)
reg1 <- plm(Var2 ~ Var1 + Var3 + Var4, data = df)
summary(reg1)
Error in pdim.default(index[[1]], index[[2]]) : duplicate couples (id-time)
Did I miss a step before running the regression or am I just using the wrong function?
I also tried to run the regression by using the lmer
function (using time
and to control for country differences):
library(lme4)
library(lmerTest)
reg2 <- lmer(Var2 ~ time(Var1) + Var3 + Var4 + (1 | country), data = df, REML = F)
summary(reg2)
Here I got a result, but I am completely unsure whether this is the way it should be done. Would this be a possibility or is it something different?