0

I have a dataset in Stata that looks something like this

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
         dv2 |      1,904    .5395645     .427109  -1.034977   1.071396
        xvar |      1,904    3.074055    1.387308          1          5

with xvar being a categorical independent variable and dv2 a dependent variable of interest.

I am estimating a simple model with the categorical variable as a dummy:

 reg dv2 ib4.xvar
eststo myest 

      Source |       SS           df       MS      Number of obs   =     1,904
-------------+----------------------------------   F(4, 1899)      =     13.51
       Model |  9.60846364         4  2.40211591   Prob > F        =    0.0000
    Residual |  337.540713     1,899  .177746558   R-squared       =    0.0277
-------------+----------------------------------   Adj R-squared   =    0.0256
       Total |  347.149177     1,903  .182422058   Root MSE        =     .4216

------------------------------------------------------------------------------
         dv2 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        xvar |
          A  |    .015635   .0307356     0.51   0.611     -.044644     .075914
          B  |   .1435987    .029325     4.90   0.000     .0860861    .2011113
          C  |   .1711176   .0299331     5.72   0.000     .1124124    .2298228
          E  |   .1337754   .0295877     4.52   0.000     .0757477    .1918032
             |
       _cons |    .447794    .020191    22.18   0.000     .4081952    .4873928
------------------------------------------------------------------------------

These are the results. As you can see B, C and E have larger effect than D which is the excluded category.

However, coefplot does not account for the in categorical variable the coefficient is composite true_A=D+A.

coefplot myest, scheme(s1color) vert

enter image description here

As you can see the plot shows the constant to be the largest coefficient, while the other to be smaller.

Is there a systematic way I can adjust for this problem and plot the true coefficients and SEs of each category?

Thanks a lot for your help

Alex
  • 1,207
  • 9
  • 25
  • If D is the reference category, it shouldn't have a coefficient associated with it. It appears that the value labelled 'D' in the plot is in fact the constant (0.44), which suggests some mis-labelling has occurred. I would double check your code for your `coefplot` command. – Bicep Oct 18 '22 at 00:39
  • Apologies, if this is mistaken, but my understanding of regressions where the independent variable is a categorical one, is that the constant term is the estimated effect (conditional mean of y given x) for the excluded category: am I wrong? – Alex Oct 18 '22 at 08:10
  • We need to be clear about the difference between plotting coefficients and plotting estimated effects. Coefficients are the values output in your regression table and can be plotted using `coefplot`, while estimated effects (i.e. in a linear regression, adding the constant coefficient and the coefficient from a level of the categorical independent variable) can be plotted using `margins` and `marginsplot`. Check out the help files for those commands. – Bicep Oct 18 '22 at 23:43
  • Dear @Cybernike, thanks a lot for your suggestion: it is indeed very helpful. Is there any chance you would be able to illustrate how this works? I am not very familiar with the Stata environment – Alex Oct 20 '22 at 14:40

1 Answers1

0

In response to your second comment, here is an example of how you can use marginsplot to plot estimated effects from a linear regression.

sysuse auto, clear
replace price = price/100
reg price i.rep78, cformat(%9.2f)

------------------------------------------------------------------------------
       price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       rep78 |
          2  |      14.03      23.56     0.60   0.554       -33.04       61.10
          3  |      18.65      21.76     0.86   0.395       -24.83       62.13
          4  |      15.07      22.21     0.68   0.500       -29.31       59.45
          5  |      13.48      22.91     0.59   0.558       -32.28       59.25
             |
       _cons |      45.65      21.07     2.17   0.034         3.55       87.74
------------------------------------------------------------------------------

margins i.rep78, cformat(%9.2f)

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       rep78 |
          1  |      45.65      21.07     2.17   0.034         3.55       87.74
          2  |      59.68      10.54     5.66   0.000        38.63       80.73
          3  |      64.29       5.44    11.82   0.000        53.42       75.16
          4  |      60.72       7.02     8.64   0.000        46.68       74.75
          5  |      59.13       8.99     6.58   0.000        41.18       77.08
------------------------------------------------------------------------------

marginsplot

Note that these values are the constant plus the appropriate coefficient.

And then using the marginsplot command we can produce the following plot, which includes the marginal estimates and confidence intervals:

enter image description here

Bicep
  • 1,093
  • 4
  • 14