-1

I'm working with a large dataset (18 million observations).

I have one continuous variable X in particular that I need to analyze, by a factor variable with about 6k levels. I'd like help in creating a loop function that for each level of the factor variable, will give me the mean and standard deviation for X, save the mean and standard deviation and then move on to the next level of the factor and repeat.

I'm thinking that a loop will solve this problem, as I need to repeat this procedure for ~6,000 levels of the factor. I especially would like to know how to tell Stata to move on to the next factor level once it finishes with the previous, and save the mean & standard deviation without over-writing these statistics for the previous factor level.

Ideally, I'd be able to save the output for each level of the factor to an Excel sheet, with each row representing one level of the factor, and three columns: A would be the factor level, B would be the mean and C would be the standard deviation.

1 Answers1

1

No loop is needed; just use Stata's collapse command:

set seed 123
clear
set obs 100000
gen x = runiform()
gen a = runiformint(1,60)

collapse (mean) m=x (sd) sd=x, by(a)
Nick Cox
  • 35,529
  • 6
  • 31
  • 47
Robert Picard
  • 1,051
  • 6
  • 9