Regression for multiple countries over time

Question

My data set looks as follows:

    country year    Var1        Var2 Var3 Var4
1   AT      2010    0.27246094  15   0    0 
2   BE      2010    0.14729459  53   0    1 
3   BG      2010    0.08744856  3    0    0 
4   CY      2010    0.15369261  6    0    0 
5   CZ      2010    0.20284360  6    0    1 
6   DE      2010    0.12541694  37   0    0 
7   AT      2011    0.35370741  16   0    0 
8   BE      2011    0.14572864  54   0    0 
9   BG      2011    0.11929461  4    0    0 
10  CY      2011    0.24550898  7    0    1 
11  CZ      2011    0.23333333  7    0    0 
12  DE      2011    0.21943574  38   0    0 
13  AT      2012    0.35073780  17   0    0 
14  BE      2012    0.19700000  55   0    0 
15  BG      2012    0.08472803  5    0    0 
16  CY      2012    0.16949153  8    0    0 
17  CZ      2012    0.26914661  8    0    0 
18  DE      2012    0.22037422  39   0    0
19  AT      2013    0.34716599  18   0    1 
20  BE      2013    0.28906250  56   0    0 
21  BG      2013    0.14602216  6    0    1 
22  CY      2013    0.44023904  9    0    0 
23  CZ      2013    0.35146022  9    0    1 
24  DE      2013    0.25500323  40   0    1

It covers 4 years for each of the 6 countries.

What I want to do is run a regression Var2 ~ Var 1.

Since I have multiple years I considered using time series. So, first I changed the year column from character to date:

library(dplyr)
mutate(testdf, year = as.Date(year, format= "%Y"))

Then, I tried to run my regression and received this error:

library(plm)
reg1 <- plm(Var2 ~ Var1 + Var3 + Var4, data = df)
summary(reg1)

Error in pdim.default(index[[1]], index[[2]]) : duplicate couples (id-time)

Did I miss a step before running the regression or am I just using the wrong function?

I also tried to run the regression by using the lmerfunction (using time and to control for country differences):

library(lme4)
library(lmerTest)
reg2 <- lmer(Var2 ~ time(Var1) + Var3 + Var4 + (1 | country), data = df, REML = F)
summary(reg2)

Here I got a result, but I am completely unsure whether this is the way it should be done. Would this be a possibility or is it something different?

plm requires each pair to be unique, see https://stackoverflow.com/questions/43663594/error-in-plm-regression I am not sure if the lmerTest should be used or not honestly. — maarvd, Mar 03 '20 at 10:18
Your `mutate` doesn't change anything as shown. If you did `testdf$year <- mutate(testdf, ...)`, do `testdf <- mutate(testdf, ...)` instead. Else, your code works for me. — jay.sf, Mar 03 '20 at 10:44
thanks for your reply. do you know a way around this issue? i am a bit helpless on this unfortunately — adi agius, Mar 03 '20 at 11:00
I tried by doing ```testdf <- mutate(testdf, year=as.Date(year, format = "%Y"))```but it gave me this Error: Evaluation error: do not know how to convert 'year' to class "Date". — adi agius, Mar 03 '20 at 11:04

score 0 · Answer 1 · answered Mar 03 '20 at 11:39

The date requires month and day, I suggest to use the beginning of the year via ISOdate.

testdf <- transform(testdf, year=as.Date(ISOdate(year, 1, 1)))  ## Note: transform is from 
                                                                ## base R

head(testdf, 3)
#   country       year       Var1 Var2 Var3 Var4
# 1      AT 2010-01-01 0.27246094   15    0    0
# 2      BE 2010-01-01 0.14729459   53    0    1
# 3      BG 2010-01-01 0.08744856    3    0    0

In the plm call you probably want to define the index= and select a model=, see ?plm.

library(plm)
reg1 <- plm(Var2 ~ Var1 + Var3 + Var4, data=testdf, index=c("country", "year"), 
            model="random")

Result:

summary(reg1)
# Oneway (individual) effect Random Effect Model 
# (Swamy-Arora's transformation)
# 
# Call:
# plm(formula = Var2 ~ Var1 + Var3 + Var4, data = testdf, model = "random", 
#     index = c("country", "year"))
# 
# Balanced Panel: n = 6, T = 4, N = 24
# 
# Effects:
#                    var  std.dev share
# idiosyncratic   0.8135   0.9019 0.001
# individual    615.6029  24.8113 0.999
# theta: 0.9818
# 
# Residuals:
#      Min.   1st Qu.    Median   3rd Qu.      Max. 
# -1.416570 -0.789216 -0.064901  0.728004  1.392325 
# 
# Coefficients:
#             Estimate Std. Error z-value  Pr(>|z|)    
# (Intercept) 18.47629    9.76600  1.8919    0.0585 .  
# Var1        12.95722    2.84290  4.5577 5.171e-06 ***
# Var4         0.32221    0.40056  0.8044    0.4212    
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Total Sum of Squares:    32.753
# Residual Sum of Squares: 15.806
# R-Squared:      0.5174
# Adj. R-Squared: 0.47144
# Chisq: 22.5147 on 2 DF, p-value: 1.2912e-05

Data:

testdf <- structure(list(country = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 
1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 
5L, 6L), .Label = c("AT", "BE", "BG", "CY", "CZ", "DE"), class = "factor"), 
    year = c(2010L, 2010L, 2010L, 2010L, 2010L, 2010L, 2011L, 
    2011L, 2011L, 2011L, 2011L, 2011L, 2012L, 2012L, 2012L, 2012L, 
    2012L, 2012L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L), 
    Var1 = c(0.27246094, 0.14729459, 0.08744856, 0.15369261, 
    0.2028436, 0.12541694, 0.35370741, 0.14572864, 0.11929461, 
    0.24550898, 0.23333333, 0.21943574, 0.3507378, 0.197, 0.08472803, 
    0.16949153, 0.26914661, 0.22037422, 0.34716599, 0.2890625, 
    0.14602216, 0.44023904, 0.35146022, 0.25500323), Var2 = c(15L, 
    53L, 3L, 6L, 6L, 37L, 16L, 54L, 4L, 7L, 7L, 38L, 17L, 55L, 
    5L, 8L, 8L, 39L, 18L, 56L, 6L, 9L, 9L, 40L), Var3 = c(0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Var4 = c(0L, 1L, 0L, 0L, 
    1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 
    0L, 1L, 0L, 1L, 1L)), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24"
))

Regression for multiple countries over time

1 Answers1