I have the following data structure:
- 186 unique firm acquisitions
- Observations for 5 years per firm; 2 years before acquisition year, acquisition year, and 2 years after
- Total number of observations is thus 186 * 5 = 930
- Two dependent variables, which I like to use in different analyses - one is binary (1/0), the other is one variable divided by another, which ranges from 0 to 5.
- Acquisition years range from 2008 to 2019
- Acquisitions took place in 20 different industries
Goal: test whether there are significant differences in target characteristics (the two DVs mentioned above) after acquisition vs before acquisition.
I expect the following unobserved factors to exist that can bias results:
- Deal-specific: some deals involve characteristics that others do not
- Target-specific: some targets might be more difficult to change, for example. Also, some targets get acquired twice in the period I am examining, so without controlling for that fact, the results will be biased.
- Acquirer-specific: some acquirers are more likely to implement change than others. Also, some acquirers engage in multiple acquisitions during the period I am examining (max is 9)
- Industry-specific: there might have been some unobserved industry-trends going on, which caused targets in certain industries to be more likely to change than targets in other industries.
- Year-specific: since the acquisitions took place in different years between 2008 and 2019, observations might be biased by unobserved year-specific factors. For example, 2020 and 2021 observations will likely be affected by the COVID-19 pandemic. I have constructed a dummy variable, post, which is coded 1 for year 1 and year 2 after acquisition, and 0 for year 1 and year 2 before acquisition.
I have been struggling with using the right models and commands in Stata. The code I have been using:
BINARY DV
First, I ran an OLS regression so that I could remove outliers after the regression:
reg Y1 post X1 post*X1 $controls i.industry i.year
Then, I removed outliers (not sure if this is the right method though):
predict estu if e(sample), rstudent
drop if abs(estu)>3.5
Then, ran the xtprobit regression below:
egen id = group(target_id acquiror_id)
xtset deal_id year
xtprobit Y1 post X1 post*X1 $controls i.industry i.year, vce(cluster id)
OTHER DV
Same as above, but replacing xtprobit
with xtreg
and Y1
with Y2
Although I get results which theoretically make sense, I feel like I am messing things up.
Any thoughts on how to improve my code?