1

I need to calculate the percentage of high school education by states. But Stata function count does not allow by option. So far I have the following:

count
local totalPopulation = r(N )

count if schenr==0
local eduBelowHighSchool = r(N)
local _eduBelowHighSchool=`eduBelowHighSchool'/`totalPopulation'

count if schenr==1
local eduHighSchool = r(N )
local _eduHighSchool=`eduHighSchool'/`totalPopulation'

count if schenr==2
local eduCollege = r(N )
local _eduCollege=`eduCollege'/`totalPopulation'

gen eduBelowHighSchool =`_eduBelowHighSchool'
gen eduHighSchool =`_eduHighSchool'
gen eduCollege =`_eduCollege'

// How can I calculate individual values for each state? I cannot use count, by (state), can I? The above code produces the following table:

enter image description here

Is there any other way to work around this issue?

Buras
  • 3,069
  • 28
  • 79
  • 126

1 Answers1

4

count is a Stata command, not a function. In Stata, "function" is not another term for "command"; rather, commands and functions are separate.

In your example, schenr is evidently 1 when high school and 0 when not. It follows that the percent you want is just the mean of schenr (multiplied by 100).

If you want a new variable,

egen pc_highschool = mean(100 * schenr), by(state) 

will put the same percent in every observation to which it applies. To get a listing of each state just once,

egen tag = tag(state)
l state pc_highschool if tag 

is one way to do it.

But you can get tables using tabulate, table, tabstat without necessarily creating a new variable. Consider this reproducible example:

. sysuse auto 

. tabulate rep78, su(foreign) nost

        Repair |   Summary of Car type
   Record 1978 |        Mean       Freq.
   ------------+------------------------
             1 |           0           2
             2 |           0           8
             3 |          .1          30
             4 |          .5          18
             5 |   .81818182          11
   ------------+------------------------
         Total |   .30434783          69

. gen foreign2 = 100 * foreign

. tabulate rep78, su(foreign2) nost

        Repair |   Summary of foreign2
   Record 1978 |        Mean       Freq.
   ------------+------------------------
             1 |           0           2
             2 |           0           8
             3 |          10          30
             4 |          50          18
             5 |   81.818182          11
   ------------+------------------------
         Total |   30.434783          69

If you want a new dataset, use contract. (Compare the answer to your very recent question How to retrieve data from multiple Stata files? .)

Community
  • 1
  • 1
Nick Cox
  • 35,529
  • 6
  • 31
  • 47
  • many thanks for your constant help. Actually, the `schenr` variable can take values 0,1,2 (i specified 0 and 1 to simplify the question). So, can i still use `egen...by(state)` in this case? i think `mean` is not going to work, so is there any other command/function i could use instead of `count`? – Buras May 29 '13 at 23:44
  • 1
    Quite so on your data. The `egen` approach can easily be adapted to that. `egen pc_highschool = mean(100 * (schenr == 1)), by(state)` gives the percent of 1s, and so on. See also http://www.stata.com/support/faqs/data-management/creating-percent-summary-variables/ and http://www.stata.com/support/faqs/data-management/true-and-false/ `contract` does do this simply. – Nick Cox May 30 '13 at 06:42
  • 2
    If you are unsure whether something is a command or a function, type in Stata `help function` which gives you more information on what functions are in Stata and why you cannot use the terms "command" and "function" interchangeably in Stata. – Maarten Buis May 30 '13 at 07:26