I am working with a highly disaggregated unbalanced panel data set in the long format on vehcile sales and want to run a FE regression model. The data structure is given like that (of course containing further information, but this is irrelevant for this case):
cars <- data.frame(make = c("Audi", "Audi", "Audi", "Audi", "Audi", "Audi", "Audi", "Audi", "Audi", "Audi", "Opel", "Opel", "Opel", "Opel"),
model = c("a1", "a1", "a1", "a1", "a1", "a1", "a3", "a3", "a3", "a3", "Corsa", "Corsa", "Corsa", "Corsa"),
trim = c("Sport", "Business", "Sport", "Business","Sport", "Business", "Cross", "Street", "Corss", "Street", "O1", "O2", "O1", "O2"),
tax = c(100, 200, 100, 200, 100, 200, 500, 600,500, 600, 50, 30, 50, 30),
sales = c(1000, 1500, 800, 1300, 1100, 1000, 50, 70, 30, 20, 5000, 2000, 3000, 3000),
time = c(1, 1, 2, 2, 3, 3, 1, 1, 2, 2, 4, 4, 5, 5))
I hope you get the idea.
So basically I have panel data with trim and time being the index variables. I do want to examine the impact of the tax on sales. For that i want to run a FE regression with model and time because I only want to consider the variation in sales and tax within a car model and a time period. The FE capture the remaining variation, in which I am not interested. In order to do so I wanted to employ
plm(sales ~ tax,
data = cars,
model = "within",
index = c("time", "model"),
effect = "twoways")
But this does not work because of the data structure with a narrower definition of vehcile (according to the trim variable). Thus, I have multiple rows with the same model for a single period (no unique id-time match). In order to overcome this issue I thought about creating a new time variable, because I dont not want to aggregate my data on trim-level on model-level. I am lacking a bit of imagination about the requirements for the new time variable and how to create it. This would optimally result in
plm(sales ~ tax,
data = cars,
model = "within",
index = c("new time variable", "model"),
effect = "twoways")
But I also wonder if i could easily solve my problem with
lm(sales ~ tax + factor(time) + factor(model), data = cars)
Does anybody have a suggestion, either on my simple idea to overcome this problem or on the idea of creating a new time variable to run the plm command (or even a completely new idea or another package)? Moreover, would it make sense to run a FE regression with time-model FE as well as trim-FE?