1

I want to determine the gender wage gap and use the reghdfe command to obtain a linear estimate of this:

ln_real_wage is the natural log of real annual wages. male returns 1 for male individuals and zero elsewhere. high_skilled_chen returns 1 for individuals working in a high-skill industry and zero elsewhere. age returns the age of the individual and full_time returns 1 when an individual works on a full time basis and zero elsewhere.

eststo: reghdfe ln_real_income i.male##i.high_skilled_chen age age_2 i.full_time, absorb(i.province) vce(cluster id_numeric)

This is the output:

Reghdfe output from above code

Then, I run predictive margins to obtain the difference between men and women dependent on the sector that they work in (high-skill sector or not).

eststo margin: margins, over(i.male i.high_skilled_chen) post

Margins output

How do I obtain the % difference in wages between males and females working in the high-skill sector and the % difference between males and females working in a low-skill sector (using the predictive margins)?

Michelle13
  • 43
  • 5

1 Answers1

1

Your base group are unskilled female workers. You can re-run the regression changing the base group to either skilled male workers or female skilled workers. In my example I set male skilled workers as base category (b1 changes the base category from 0 to 1): i.male##i.high_skilled_chen -> ib1.male#ib1.high_skilled_chen.

The coefficient of male#high_skilled_chen in the line 0 1 can be transformed to give you the difference in percent between high-skilled male and female workers. Given the sizes of the coefficients I would not use 100*b percent but the (exp(b)-1)*100 percent interpretation (you can read the details e. g. here). You need to replace b with the coefficient of 0.male#1.high_skilled_chen. If e.g. the coefficient is 0.3 then you could say that female skilled workers earn on average 35 % more than skilled male workers.

As a side note, it is a better to use ppmlhdfe with untransformed wages instead reghdfe of log-wages, because you might have people who do not work and thus have an annual income of zero, which cannot be modeled with log-wages. On a more technical side, if your original model inhibits heteroskedasticity, then the log-level model you use is inconsistent (see e.g. here). By specifying cluster() you implicitly assume that.

  • Thanks so much ```@Maultasche```. Is there a way to obtain the % difference between males and females using the predictive margins? For example, ```(exp(11.531) - exp(11.465)/exp(11.465))*100```? – Michelle13 Jun 23 '23 at 09:46
  • I think that working with predictive margins might lead to errors due to Jensen's inequality. It is also explained in this [article](https://www.stata.com/stata-news/news34-2/spotlight/). However, you can still use the predictive margin of `ppmlhdfe` as the model estimated by this function is estimating the same model, but on original scale (Jensen's inequality is then no issue as you are on the correct scale). –  Jun 24 '23 at 14:47