-1

I run a simple regression in Stata for two subsamples and afterwards I want to exclude all observations with standardized residuals larger than 3.0. I tried:

regress y x if subsample_criteria==1
gen st_res1=e(rsta)
regress y x if subsample_criteria==0
gen st_res2=e(rsta)
drop if st_res1 | st_res2 > 3.0

However, the new variable is full of missing values and the values for the stand. residuals are not stored in the variables st_res1 and st_res2.

I am grateful for any hints!

jeffrey
  • 2,026
  • 5
  • 28
  • 42
  • 4
    If you think that outliers are a real problem and will need to be removed, `regress` is not implementing the best method. The genuine outliers will exert leverage on the fit and won't necessarily have the largest residuals after the fit. Consider robust-resistant regression, transformations, etc. This is independent of any programming question, but should still be of concern. – Nick Cox Nov 04 '15 at 08:20
  • Minimal research would have been to look at the list of returned results in the manual or the output of j`ereturn list`, which would have led to the discovery that `e(rsa)` doesn't exist. – Steve Samuels Nov 06 '15 at 15:21

2 Answers2

3

The problem with your code is that Stata does not know what e(rsta) is (and neither do I), so it creates a missing, which Stata thinks of as very large positive number. All missings are greater than 3, so your constraint does not bind.

Ignoring the statistical merits of doing this, here's one way:

sysuse auto, clear
reg price mpg 
predict ehat, rstandard
reg price mpg if abs(ehat)<3

Note that I am using the absolute value of the residual, which I think makes more sense here.

dimitriy
  • 9,077
  • 2
  • 25
  • 50
  • Posted while I was composing my own response. Good point on the missing `st_res` variables being interpreted as > 3. – Brendan Nov 03 '15 at 21:58
2

First, providing a MCVE is always a good first step (and fairly easy given Stata's sysuse and webuse commands). Now, on to the question.

See help regress postestimation and help predict for the proper syntax for generating new variables with residuals, etc. The syntax is a bit different from the gen command, as you will see below.

Note also that your drop if condition is improperly formatted, and right now is interpreted as drop if st_res1 != 0 | st_res2 > 3.0. (I also assume you want to drop standardized residuals < -3.0, but if this is incorrect, you can remove the abs() function.)

sysuse auto , clear
replace mpg = 10000 in 1/2
replace mpg = 0.0001 in 70

reg mpg weight if foreign
predict rst_for , rstandard

reg mpg weight if !foreign
predict rst_dom , rstandard

drop if abs(rst_for) > 3.0 | abs(rst_dom) > 3.0

Postscript: Note that you may also consider adding if e(sample) to your predict commands, depending on whether you wish to extrapolate the results of the subsample regression to the entire sample and evaluate all residuals, or whether you only wish to drop observations based on in-sample standardized residuals.

Community
  • 1
  • 1
Brendan
  • 3,901
  • 15
  • 23