When producing the data, you could use the more efficient in
instead of if
. But to be honest, I believe the data set would have to be very big for time differences to be perceivable. You can do some experimenting to check for that.
The second issue on random draws is already addressed by a series of posts authored by Bill Gould (StataCorp's president). Some code below with inline comments. You can run the whole thing and check the results.
clear
set more off
*----- first question -----
/* create data with certain distribution */
set obs 100
set seed 23956
gen obs = _n
gen rand = runiform()
sort rand
gen Color = ""
/*
// original
replace Color = "Blue" if _n <= _N*.2
replace Color = "Red" if _n > _N*.2 & _n <= _N*.5
replace Color = "Green" if Color==""
*/
// using -in-
replace Color = "Blue" in 1/`=floor(_N*.2)'
replace Color = "Red" in `=floor(_N*.2) + 1'/`=floor(_N*.5)'
replace Color = "Green" in `=floor(_N*.5) + 1'/L
/*
// using -cond()-
gen Color = cond(_n <= _N*.2, "Blue", cond(_n > _N*.2 & _n <= _N*.5, "Red", "Green"))
*/
drop rand
sort obs
tempfile allobs
save "`allobs'"
tab Color
*----- second question -----
/* draw without replacement a random sample of 20
observations from a dataset of N observations */
set seed 89365
sort obs // for reproducibility
generate double u = runiform()
sort u
keep in 1/20
tab obs Color
/* If N>1,000, generate two random variables u1 and u2
in place of u, and substitute sort u1 u2 for sort u */
/* draw with replacement a random sample of 20
observations from a dataset of N observations */
clear
set seed 08236
drop _all
set obs 20
generate long obsno = floor(100*runiform()+1)
sort obsno
tempfile obstodraw
save "`obstodraw'"
use "`allobs'", clear
generate long obsno = _n
merge 1:m obsno using "`obstodraw'", keep(match) nogen
tab obs Color
These and other details can be found in the four-part series on random-number
generators, by Bill Gould: http://blog.stata.com/2012/10/24/using-statas-random-number-generators-part-4-details/
See also help sample
!