1

For my bachelor thesis I have a regression with fixed-effects and time-fixed effects in years:

log⁡(production_it )= β_0 + β_1 * log⁡(temp_it ) + β_2 * log⁡(rain_it ) + β_3 * drought_it + β_4 * flood_it + β_5 * storm_it + β_6 * log⁡(labour_it )+ β_7* log⁡(Fertilitzer_it )+ β_8* log⁡(capital_it )+ β_9* log⁡(area_it )+ η_t+ u_i+ ε_it

where

i: country, t: year

r1.time.fixed <- plm(log(production) ~ log(temp) + log(rain) + drought + flood + 
                       storm + log(labour) + log(fertilizer) + log(capital) +
                       log(area), data=pm.rich, model="within", effect="twoways")

Now I want to create the following regression with decade instead of year as fixed-effect:

log⁡(production_id )= β_0 + β_1 * log⁡(temp_id ) + β_2 * log⁡(rain_id ) + β_3 * drought_id + β_4 * flood_id + β_5 * storm_id + β_6 * log⁡(labour_id )+ β_7* log⁡(Fertilitzer_id )+ β_8* log⁡(capital_id )+ β_9* log⁡(area_id )+ η_d+ u_i+ ε_id

where

i:country, d: decade

How can I create the decade fixed-effects in r, given a panel data set based on year data?

Here you can find my data I use:

enter image description here

Helix123
  • 3,502
  • 2
  • 16
  • 36
mag123
  • 21
  • 3

1 Answers1

1

First, you want the averages for each country in each decade as single observations, where you probably want the mean values for each country in each decade. We do this outside plm using aggregate, where we paste0 together the first three substrings of the years with a zero, e.g. 1935 → 1930. Let me show you with the Grunfeld data that comes with plm:

library(plm)
data(Grunfeld)
Grunfeld <- transform(Grunfeld, decade=paste0(substr(year, 1, 3), "0"))
head(Grunfeld, 3)
#   firm year   inv  value capital decade
# 1    1 1935 317.6 3078.5     2.8   1930
# 2    1 1936 391.8 4661.7    52.6   1930
# 3    1 1937 410.6 5387.1   156.9   1930
dim(Grunfeld)
# [1] 200   6

This allows us to aggregate along the decades:

Grunfeld.a <- aggregate(. ~ firm + decade, Grunfeld, mean)
head(Grunfeld.a, 3)
#   firm decade year    inv   value capital
# 1    1   1930 1937 341.70 4046.54  124.98
# 2    2   1930 1937 305.56 1921.00  159.06
# 3    3   1930 1937  49.60 2057.12  129.80
dim(Grunfeld.a)
# [1] 30  6

Now we could just put the aggregated Grunfeld.a into the original plm call, since plm internally does some "magic" to recognize the unit and time variables. However, I consider that as dangerous and recommend to explicitly state the index=es in the plm call (see an earlier answer of mine for a comprehensive explanation):

## unit and year FE
fit.year <- plm(inv ~ value + capital, data=Grunfeld, index=c("firm", "year"), 
                model="within", effect="twoways")

## unit and decade FE
fit.decade <- plm(inv ~ value + capital, data=Grunfeld.a, index=c("firm", "decade"), 
                model="within", effect="twoways")

For the statistical summary you may want to use robust standard errors clustered by unit (i.e. by country in your case, or firm in this example). plm comes with a summary.plm method that allows to customize the vcov= using vcovHC.plm. Note that cluster=c("group") confusingly means that you want to cluster by the variable you defined as unit variable, (i.e. country in your case). For the estimation type= we might want to use "HC3" as it is standard in the sandwich package and often recommended today.

## unit and year FE
summary(fit.year, vcov=vcovHC(fit.year, type="HC3", cluster=c("group")))$coe
#          Estimate Std. Error t-value     Pr(>|t|)
# value   0.1177159 0.01212638 9.70742 5.539985e-18
# capital 0.3579163 0.05915972 6.05000 9.049182e-09

## unit and decade FE
summary(fit.decade, vcov=vcovHC(fit.decade, type="HC3", cluster=c("group")))$coe
#          Estimate Std. Error  t-value     Pr(>|t|)
# value   0.1541480 0.05601229 2.752039 1.417452e-02
# capital 0.3476384 0.06517729 5.333735 6.719637e-05
jay.sf
  • 60,139
  • 8
  • 53
  • 110