R - Panel data FE, no unique time-id match, creating new time-variable

Question

I am working with a highly disaggregated unbalanced panel data set in the long format on vehcile sales and want to run a FE regression model. The data structure is given like that (of course containing further information, but this is irrelevant for this case):

cars <- data.frame(make = c("Audi", "Audi", "Audi", "Audi", "Audi", "Audi", "Audi", "Audi", "Audi", "Audi", "Opel", "Opel", "Opel", "Opel"),
               model = c("a1", "a1", "a1", "a1", "a1", "a1", "a3", "a3", "a3", "a3", "Corsa", "Corsa", "Corsa", "Corsa"),
               trim = c("Sport", "Business", "Sport", "Business","Sport", "Business", "Cross", "Street", "Corss", "Street", "O1", "O2", "O1", "O2"),
               tax = c(100, 200, 100, 200, 100, 200, 500, 600,500, 600, 50, 30, 50, 30),
               sales = c(1000, 1500, 800, 1300, 1100, 1000, 50, 70, 30, 20, 5000, 2000, 3000, 3000),
               time = c(1, 1, 2, 2, 3, 3, 1, 1, 2, 2, 4, 4, 5, 5))

I hope you get the idea.

So basically I have panel data with trim and time being the index variables. I do want to examine the impact of the tax on sales. For that i want to run a FE regression with model and time because I only want to consider the variation in sales and tax within a car model and a time period. The FE capture the remaining variation, in which I am not interested. In order to do so I wanted to employ

plm(sales ~ tax, 
         data = cars,
         model = "within",
         index = c("time", "model"),
         effect = "twoways")

But this does not work because of the data structure with a narrower definition of vehcile (according to the trim variable). Thus, I have multiple rows with the same model for a single period (no unique id-time match). In order to overcome this issue I thought about creating a new time variable, because I dont not want to aggregate my data on trim-level on model-level. I am lacking a bit of imagination about the requirements for the new time variable and how to create it. This would optimally result in

plm(sales ~ tax, 
         data = cars,
         model = "within",
         index = c("new time variable", "model"),
         effect = "twoways")

But I also wonder if i could easily solve my problem with

lm(sales ~ tax + factor(time) + factor(model), data = cars)

Does anybody have a suggestion, either on my simple idea to overcome this problem or on the idea of creating a new time variable to run the plm command (or even a completely new idea or another package)? Moreover, would it make sense to run a FE regression with time-model FE as well as trim-FE?

The variable for the time index goes into the 2nd slot of the argument `index` of function `plm`. — Helix123, Apr 15 '21 at 14:44
Thank you, good point. However, it does not solve the problem, I still get the error message of "duplicate couples". — Joschka, Apr 15 '21 at 17:28
About your question: I see two approaches: i) create a new individual index made of model and trim; ii) average (or aggregate in some other way) your data over the same time period and different types of trom per model such that you have only one time period. — Helix123, Apr 20 '21 at 18:29
The fundamental issue is reflected in the output of the lm approach you suggest: `summary(lm(sales ~ tax + factor(time) + factor(model), data = cars))` - look at the coefficient with `NA`. — Helix123, Apr 20 '21 at 18:32
But tax varies within model (because of different taxes for different trims). So this is not an issue. — Joschka, Apr 24 '21 at 20:18
I do not want to aggregate my data, I am very happy about that level of detail of my data. I do not exactly see how the approach with creating a new index based on model and trim would work but this definitely goes in the correct direction. — Joschka, Apr 24 '21 at 20:20
I put an answer to your question showing how to contruct the new individual index and what happens to variable `tax` due to the within transformation. — Helix123, Apr 26 '21 at 17:56

LucaCoding · Answer 1 · 2021-04-15T14:21:53.443

0

please add a little reproducible dataset and try to be more clear with what you want.

I'm trying to answer anyways, Fixed Effects model is correct if your variables vary over time, if you have time-invariant variables might prefer a Pooled OLS or Random one (Please check here: Econometrics Academy).

Plm package is the correct one, however, for doing the regression I believe there is an error, below my suggestion:

    library(plm) 
    p.data <- pdata.frame(data, index=c("trim","time")) 
    attach(p.data) 
    y <- cbind(sales) 
    X <- cbind(lt_tax)          
    
    model1 <- plm(y~X+factor(time), data=p.data, model = "within", effect = "twoways")
    summary(model1)

Hope this is useful.

edited Apr 15 '21 at 14:21

answered Apr 15 '21 at 10:39

LucaCoding

65
12

Thanks for your quick answer. I edited my initial question and added some reproducible data. The problem with your suggested code is, that my data is not defined by model and time as index variables but by trim and time. So that does not work. I did not get why you would not add "factor(model)", I think my underlying idea (as I now added) is correct, isnt it? – Joschka Apr 15 '21 at 13:58
Thanks @Joschka for making better your question. I changed the code in reply (trim as cross-sectional dimension and time as time-series dimension), but still doesn't work, with a fixed effect. I believe is more an econometrics issue. However, the code works with Pooled OLS. – LucaCoding Apr 15 '21 at 14:28
with "trim" as cross-sectional unit it works for me. However, I do not want "trim" as cross-sectiona unit, I want "model" as cross-sectional unit. And then it does not work, because of "duplicate couples". – Joschka Apr 15 '21 at 17:27

Helix123 · Answer 2 · 2021-04-26T20:19:35.117

Since you asked to show how a new individual index can be contructed from model and trim, here is how to do that. However, not that your variable tax does not vary per model-trim combination (can check via, e.g., pvar or by looking at the model matrix after within transformation). Thus, your within model with model-trim being the individual index is non-estimable.

cars <- data.frame(make = c("Audi", "Audi", "Audi", "Audi", "Audi", "Audi", "Audi", "Audi", "Audi", "Audi", "Opel", "Opel", "Opel", "Opel"),
                   model = c("a1", "a1", "a1", "a1", "a1", "a1", "a3", "a3", "a3", "a3", "Corsa", "Corsa", "Corsa", "Corsa"),
                   trim = c("Sport", "Business", "Sport", "Business","Sport", "Business", "Cross", "Street", "Corss", "Street", "O1", "O2", "O1", "O2"),
                   tax = c(100, 200, 100, 200, 100, 200, 500, 600,500, 600, 50, 30, 50, 30),
                   sales = c(1000, 1500, 800, 1300, 1100, 1000, 50, 70, 30, 20, 5000, 2000, 3000, 3000),
                   time = c(1, 1, 2, 2, 3, 3, 1, 1, 2, 2, 4, 4, 5, 5))


# See NA coefficient in two-way LSDV model
summary(lm(sales ~ tax + factor(time) + factor(model), data = cars))
#> 
#> Call:
#> lm(formula = sales ~ tax + factor(time) + factor(model), data = cars)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -1470.3  -135.1     0.0   131.3  1470.3 
#> 
#> Coefficients: (1 not defined because of singularities)
#>                     Estimate Std. Error t value Pr(>|t|)  
#> (Intercept)          762.884    904.730   0.843   0.4270  
#> tax                    2.972      5.056   0.588   0.5750  
#> factor(time)2       -117.500    569.739  -0.206   0.8425  
#> factor(time)3       -158.750    753.695  -0.211   0.8392  
#> factor(time)4       2618.219    936.655   2.795   0.0267 *
#> factor(time)5       2118.219    936.655   2.261   0.0582 .
#> factor(model)a3    -2296.476   2100.974  -1.093   0.3106  
#> factor(model)Corsa        NA         NA      NA       NA  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 805.7 on 7 degrees of freedom
#> Multiple R-squared:  0.8291, Adjusted R-squared:  0.6827 
#> F-statistic: 5.662 on 6 and 7 DF,  p-value: 0.01921


# make new individual index from model and trim
cars$modeltrim <- paste(cars$model, cars$trim, sep = "_")

# formula one-way within via LSDV
form <- sales ~ tax +  factor(modeltrim)
summary(lm(form, data = cars))
#> 
#> Call:
#> lm(formula = form, data = cars)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -1000.0  -131.2    12.5   108.3  1000.0 
#> 
#> Coefficients: (1 not defined because of singularities)
#>                             Estimate Std. Error t value Pr(>|t|)   
#> (Intercept)                 2717.647    518.058   5.246  0.00119 **
#> tax                           -7.255      3.319  -2.186  0.06509 . 
#> factor(modeltrim)a1_Sport  -1025.490    463.745  -2.211  0.06267 . 
#> factor(modeltrim)a3_Corss    939.804   1396.610   0.673  0.52259   
#> factor(modeltrim)a3_Cross    959.804   1396.610   0.687  0.51405   
#> factor(modeltrim)a3_Street  1680.294   1637.233   1.026  0.33890   
#> factor(modeltrim)Corsa_O1   1645.098    584.414   2.815  0.02596 * 
#> factor(modeltrim)Corsa_O2         NA         NA      NA       NA   
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 618.1 on 7 degrees of freedom
#> Multiple R-squared:  0.8994, Adjusted R-squared:  0.8132 
#> F-statistic: 10.44 on 6 and 7 DF,  p-value: 0.003392


# one-way within model via plm
library(plm)
plm(sales ~ tax, 
    data = cars,
    model = "within",
    index = c("modeltrim", "time"),
    effect = "individual")
#> Error in plm.fit(data, model, effect, random.method, random.models, random.dfcor, : empty model

plm(sales ~ tax,
    data = cars,
    model = "within",
    index = c("modeltrim", "time"),
    effect = "twoways")
#> Error in plm.fit(data, model, effect, random.method, random.models, random.dfcor, : empty model

# tax does not vary per modeltrim (does not vary per individual) - within model non-estimable
pvar(cars, index = c("modeltrim", "time"))
#> no time variation:       make model trim tax modeltrim 
#> no individual variation: make time
#
# look at variable tax after one-way within transformation
pcars <- pdata.frame(cars, index = c("modeltrim", "time"))
mf <- model.frame(pcars, sales ~ tax)
model.matrix(mf, model = "within")[ , "tax"]
#>  1  2  3  4  5  6  7  8  9 10 11 12 13 14 
#>  0  0  0  0  0  0  0  0  0  0  0  0  0  0

R - Panel data FE, no unique time-id match, creating new time-variable

2 Answers2