0

I looking for a way to estimate various models (lets say 10) and save a certain parameter value from each estimation in a vector with stata.

Im more of a R-guy and here it is very simple working example with R-code

n1 <- 100
n2 <- 10
group <- rep(1:10,each=n1)
data <- as.data.frame(cbind(rnorm(n1*n2,0,1),rnorm(n1*n2,0,1),group))
dimnames(data)[[2]] <- c("y","x","group")
val <- names(table(group))
estimates <- vector(mode="numeric",length=length(val))

for( i in 1:length(val)){
j <- which(data$group==val[i])
estimates[i] <- coef(lm(y[j] ~ x[j], data=data))[2]
}

Alternatively

library(nlme)
mod1 <- lmList(y~x | group, data=data)
coef(mod1)[,2]

And yes, unfortunately I need to use stata :-(

Grzegorz Adam Kowalski
  • 5,243
  • 3
  • 29
  • 40
Druss2k
  • 275
  • 2
  • 5
  • 15
  • is `statsby` the right approach? – Druss2k Jan 17 '13 at 14:12
  • 4
    Your question is posed as if it were aimed at people fluent in both R and Stata, so that they can understand the R code and immediately translate to a Stata equivalent. Also your personal dig at Stata, however humorously intended, is not well advised. A better tactic would be to show also your Stata code so far, so that Stata people can see quickly what you are trying to do. But yes: if your models are fitted separately to different subsets, then `statsby` is a good approach to collate parameter estimates. – Nick Cox Jan 17 '13 at 14:30
  • It was not my intention to nag at someone. But I assumed that if I post the hole R-code as a example the logical response would be "Why dont you do it in R?" The reason I posted the R-code at all is that I wanted to provide some kind of example. – Druss2k Jan 18 '13 at 00:25

2 Answers2

3

What is you ultimate objective? The paradigms of Stata and R are different, so knowing the ultimate goal would help. In R I tend to think in terms of vectors, but not in Stata (vectors don't really exist in Stata). If you want a table, then I suggest the estout package from SSC (ssc install estout). If just want the coefficients as an end in themselves, then I suggest statsby.

clear
version 11.2
set seed 2001

* generate your data
set obs 1000
generate y = rnormal()
generate x = rnormal()
generate group = 1 + floor((_n - 1) / 100)

* if you want a table
* you'll need the estout package from SSC (ssc install estout)
eststo clear
forvalues i = 1/10 {
    eststo : regress y x if (group == `i')
}
esttab

* if you just the coefficients
statsby, by(group) clear : regress y x
list

Both esttab and statsby have lots of options, so check out the help files.


Update: It seems you want time series betas by group (here a firm). In terms of economics I think you would want rolling regressions, but this framework should get you started.

clear
version 11.2
set seed 2001

* generate your data
set obs 1000
generate y = rnormal()
generate x = rnormal()
generate firm = 1 + floor((_n - 1) / 100)
generate year = 1 + mod((_n - 1), 100)

* regress by firm
xtset firm year
statsby _b, by(firm) saving(temp, replace) : regress y x

* then merge back
merge m:1 firm using temp
list in 1/20
Richard Herron
  • 9,760
  • 12
  • 69
  • 116
  • Hi. My ultimate goal is to regress a variable y on a variable x. Then I want to repeat this procedure as many times as I have groups within my data. Subsequently the slope estimates of this calculations are supposed to be stored somewhere such that I can access them to perform a additional regression. This regression will be a pooled version of a variable which will then be regressed onto this coefficients recieved from the previous regression. The reason I strugle is that I do rather good in R but cannot "speak" stata at all. I already noticed that the paradigm is very different – Druss2k Jan 18 '13 at 00:31
  • 2
    This supports the suggestions of Richard Herron and myself: look at `statsby` in particular. – Nick Cox Jan 18 '13 at 00:56
  • I did but if I try this `statsby "xtreg y x" _b[x], by(group) clear` to save the values in a new data set somehow it does not return the values. – Druss2k Jan 18 '13 at 01:10
  • I thought this `statsby _b[x], by(group) saving(statby): xtreg y x` will do it but as it seems it just saves a new file called statby.dta which contains the original data. – Druss2k Jan 18 '13 at 01:51
  • 1
    @Druss2k What defines your individual and time for `xtreg`? Does this technique have a name or reference? – Richard Herron Jan 18 '13 at 02:09
  • The observations are uniquely identified by the group (which is nothing else then a company) and the year. This technique does not rly have a name. To make more sense out of it imagine one wants to analize the connection between revenue and investment. The variable which corresponds to the influence, the slope coefficient, is then used again in another regression. I'm just helping out a college and he wants to do it in this way. So I just try to do the programming for him. – Druss2k Jan 18 '13 at 02:21
  • Oh thx very much :-). I thought since xtreg does a GLS-estimation and it is a time series within each group this regression method would be the right thing to do. But great that is exactly what ive been looking for :-) – Druss2k Jan 18 '13 at 02:33
  • Is it not `xtgls` which I would use instead of `regress` which does ordinary ols? – Druss2k Jan 18 '13 at 02:40
  • 1
    I haven't used `xtgls`, but that does feasible GLS on a panel. With `statsby` you no longer have a panel -- you're running individual time series regressions. In finance I see Newey-West standard errors or clustering more often than I see FGLS. – Richard Herron Jan 18 '13 at 04:32
1

This calls for a multilevel model in which level-1 regression is your firm-level regression, and level-2 regression is the regression explaining variability between the group slopes. What you are doing is overly cumbersome, and would not give you the right standard errors, anyway. This most clearly implemented via gllamm, although you can probably twist the hands of xtmixed to do that, too.

StasK
  • 1,525
  • 10
  • 21
  • Very true. Only problem is that my college is kind of stubborn. I would not favor the method I described earlier too. – Druss2k Jan 18 '13 at 16:16
  • 1
    @Druss2k The method you described is very common in finance (along with time-series averages of cross-sectional regression coefficients). That they beta from the first-stage is estimated doesn't necessarily affect the second stage, right? If your beta estimation error is white noise, then you have an errors-in-variables problem, which would bias-down your second stage coefficients, but is unavoidable. – Richard Herron Jan 18 '13 at 16:24
  • But if I would pick a method my guess is I would go with something mixed effect modelling related too. I dont quite get why someone would choose something different in this particular example. Honestly I'm not quite sure how the seocond stage will affect the first stage since (I did not mention this before) the independent variable from which we pick the beta coef. from the second stage is the dependend variable of the first stage. This could induce some sort of bias related to endogenity. – Druss2k Jan 18 '13 at 20:11
  • 1
    Richard, by estimating everything simultaneously, you can both avoid the attenuation bias, and get the standard errors right. If the finance people don't do this, it is beyond me. At the very least, corrections for the standard errors for generated regressors have been out for a quarter century, see http://www.jstor.org/stable/1391724. – StasK Jan 18 '13 at 20:40
  • 1
    Thanks for the link, @StasK. I will check this out. Wait until you find out about Fama and MacBeth (1973) regressions. :) – Richard Herron Jan 18 '13 at 21:18
  • I guess we figured out why we had the huge financial crisis^^. In a recent work I was involved in I had to handle exact this kind of problem. FYI the standard errors which took the generated regressors into consideration where 5times the size of the standard errors from the OLS-output. – Druss2k Jan 18 '13 at 21:18
  • Richard, I know about Fama and MacBeth alright. Since then, Stata and R appeared, you know. – StasK Jan 19 '13 at 23:49