0

I want to select my sample in Stata 13 based on three stratum variables with 12 strata in total (size - two strata; sector - three strata; intangible intensity - two strata). The selection should be proportional without replacement.

However, I can only find disproportionate selection commands that select for instance x% of each stratum.

Can anyone help me out with this problem?

Nick Cox
  • 35,529
  • 6
  • 31
  • 47
Tobias
  • 1
  • 1
  • 3
  • What is proportionate sampling except selecting the same fraction in each stratum? – Nick Cox May 05 '16 at 13:08
  • Proportionate means (at least from my understanding; please correct me if I'm wrong) you select subjects in each stratum based on the distribution in the population, e.g. 35 % of the population are large companies, so in the end 35 % of large companies should be in your sample. – Tobias May 06 '16 at 07:04
  • In that case the stratification is irrelevant. I think the confusion here may be a statistical fallacy, that you want a random sample to be a miniature replica of the population. Search out a series of papers by Kruskal and Mosteller in _International Statistical Review_ 1979f. – Nick Cox May 06 '16 at 07:09
  • Could you please have a look at this one: http://www.ats.ucla.edu/stat/stata/faq/gsample.htm (section "Other gsample features"). Here it says the following: "Gsample is also capable of stratified and cluster sampling and these can be combined with the weights option." Is this statistical nonsense or did I get you wrong? – Tobias May 06 '16 at 08:09
  • Please make your questions self-contained and not dependent on reading external sources. More importantly, your question is now more statistical than programming and is to that extent off-topic here in my view. – Nick Cox May 06 '16 at 09:07
  • Ok, then I would like to emphasize the question of how to implement the task described above in stata. – Tobias May 06 '16 at 09:17
  • I don't see how this is anything but `sample` but I am not a sampling expert. – Nick Cox May 06 '16 at 09:23

2 Answers2

0

Thank you for this discussion. I think I know where my problem was.

The command "gsample" can select strata based on different variables. Therefore, I thought I had to define three different stratum variables. But the solution should be more simple.

There are 12 strata in total (the large firms with high intensity in sector 1, the small firms with high intensity in sector 1, and so on) with each firm in the sample falling in to one of the strata.

All I have to do is creating a variable "strataident" with values from 1 to 12 identifying the different strata. I do this for the population dataset, so the number of firms falling into each stratum is representative for the population. The following code will provide me a stratified random sample that is representative for the population.

gsample 10, percent strata (strataident) wor

This command works as well and is much easier, see the example in 1:

gsample 10, percent wor strata(size sector intensity)
Tobias
  • 1
  • 1
  • 3
  • Closing is a negative action; otherwise all threads remain open indefinitely as others may wish to add further answers (or edit them), depending on reputation. You can accept your own answer. http://stackoverflow.com/help/accepted-answer – Nick Cox May 06 '16 at 11:41
  • Thank you, I will keep it open. – Tobias May 06 '16 at 11:53
  • Thank you, I deleted the closing question – Tobias May 06 '16 at 12:10
-1

The problem is, that strata may "overlap". So you probably have to rebalance the sample after initial draft.

Now the question is, how this can be implemented. The final sample should represent the proportion of the population as good as possible.

Marco
  • 1
  • By definition, strata in stratified sampling partition the population and do not overlap. – Steve Samuels May 05 '16 at 20:50
  • What I wrote was wrong. I meant that the stratification is multi dimensional and every individual is part of one stratum in every dimension. – Marco May 05 '16 at 21:06