1

I have a .dta file of loan data over the course of 10 years, and want to run a Fama-Macbeth regression on the data to estimate risk premiums on loan returns.

For a quick overview of what Fama-Macbeth regression is, here's an excerpt from an older stackoverflow post

Fama Macbeth regression refers to a procedure to run regression for panel data (where there are N different individuals and each individual corresponds to multiple periods T, e.g. day, months,year). So in total there are N x T obs. Notice it's OK if the panel data is not balanced. The Fama Macbeth regression is to first run regression for each period cross-sectinally, i.e. pool N individuals together in a given period t. And do this for t=1,...T. So in total T regressions are run. Then we have a time series of coefficients for each independent variable. Then we can perform hypothesis test using the time series of coefficients. Usually we take the average as the final coefficients of each independent variable. And we use t-stats to test significance.

This process can be done in Stata using the asreg command. Running this on the data after declaring it as a panel gives us:

. sort FacilityID yyyymm

. xtset FacilityID yyyymm

Panel variable: FacilityID (unbalanced)
 Time variable: yyyymm, 199908 to 200911, but with gaps
         Delta: 1 unit


.  asreg ExcessRet1 Mom STM, fmb newey(3)

Fama-MacBeth Two-Step procedure (Newey SE)       Number of obs     =     58608
(Newey-West adj. Std. Err. using lags(3))        Num. time periods =       124
                                                 F(  2,   121)     =      4.69
                                                 Prob > F          =    0.0109
                                                 avg. R-squared    =    0.1284
                                                 Adj. R-squared    =    0.1245
------------------------------------------------------------------------------
             |              Newey-FMB
  ExcessRet1 | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         Mom |    5.44422   1.793302     3.04   0.003     1.893906    8.994534
         STM |   .8705018   2.164802     0.40   0.688    -3.415295    5.156298
        cons |  -.0756198   .1027633    -0.74   0.463    -.2790669    .1278273
------------------------------------------------------------------------------

However running the same process in python using the FamaMacbeth class from the linearmodels package (Documentation here) gives verey different results.

The dataframe was imported into python using pd.read_stata(). After declaring the data as a panel, running the regression gives very different results:

import linearmodels.panel.model.FamaMacbeth as lm_fm

factors = ["Mom", "STM"]
df = df.set_index(["FacilityID", "yyyymm"])
formula = "ExcessRet1 ~ 1 + " + " + ".join(factors)

reg = lm_fm.from_formula(formula, data=tempdf)
res = reg.fit(cov_type='kernel', kernel="newey-west")
print(res)

prints

                             Parameter Estimates                              
==============================================================================
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
Intercept     -0.2338     0.1223    -1.9107     0.0561     -0.4736      0.0060
Mom            5.8947     2.0287     2.9057     0.0037      1.9184      9.8709
STM            3.9649     3.1279     1.2676     0.2050     -2.1659      10.096
==============================================================================

There is a significant difference between the results of the 2 regressions, and other solutions (such as the one posted in the older stackoverflow post mentioned earlier, and an implementation of the regression I found on GitHub all give the same results as the linearmodels attempt.

What is causing this difference in the results? Is it a change in the procedure? How would I have to change my code to get the results from the Stata implementation?

1 Answers1

1

You might be doing something wrong, which I cannot replicate at my end. I am going to create some simulated data and post results from my asreg program and fama_macbeth from finance_byu library.

Results from the finance_byu library

n_firms = 1.0e2
n_periods = 1.0e2
def firm(fid):
>>>     f = np.random.random((int(n_periods),4))
>>>     f = pd.DataFrame(f)
>>>     f['period'] = f.index
>>>     f['firmid'] = fid
>>>     return f
>>> df = [firm(i) for i in range(int(n_firms))]
>>> df = pd.concat(df).rename(columns={0:'ret',1:'exmkt',2:'smb',3:'hml'})
>>> df.head()

         ret       exmkt         smb       hml  period  firmid
0   0.607847    0.264077    0.158241    0.025651    0       0
1   0.140113    0.215597    0.262877    0.953297    1       0
2   0.504742    0.531757    0.812430    0.937104    2       0
3   0.709870    0.299985    0.080907    0.624482    3       0
4   0.682049    0.455993    0.230743    0.368847    4       0

result = fama_macbeth(df,'period','ret',['exmkt','smb','hml'],intercept=True)
fm_summary(result)


              mean      std_error     tstat
intercept   0.483657    0.009682    49.956515
exmkt       0.017926    0.009364     1.914239
smb        -0.001474    0.010007    -0.147283
hml         0.001873    0.010330     0.181276

Results from asreg

/* Save the dataframe as a Stata data file */
df.to_stata("example.dta")

use "example.dta" 
tsset firmid period

asreg ret exmkt smb hml, fmb
Fama-MacBeth (1973) Two-Step procedure           Number of obs     =     10000
                                                 Num. time periods =       100
                                                 F(  3,    96)     =      1.23
                                                 Prob > F          =    0.3016
                                                 avg. R-squared    =    0.0293
                                                 Adj. R-squared    =   -0.0010
------------------------------------------------------------------------------
             |            Fama-MacBeth
         ret | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       exmkt |   .0179255   .0093643     1.91   0.059    -.0006625    .0365136
         smb |  -.0014739    .010007    -0.15   0.883    -.0213375    .0183898
         hml |   .0018726   .0103299     0.18   0.857    -.0186322    .0223773
        cons |    .483657   .0096816    49.96   0.000     .4644393    .5028748
------------------------------------------------------------------------------

Both the regression coefficients and errors are identical in the output of asreg and finance_byu lobrary.

Here is the official page where more examples and uses of asreg can be found.

  • Thank you so much for reaching out. I know it's a bit late, but it turned out that my code had a mistake during data preprocessing, it wasn't an issue with either package. Sorry about the confusion! – Kailash Seshadri Dec 18 '22 at 13:39