I am trying to generate a dummy variable that = 1 if at least two or more (out of seven) dummy variables also == 1. Could anybody tell me an efficient way of doing this?
-
2Have you tried anything in particular? Any code attempts you can share? – Roberto Ferrer Jun 29 '15 at 16:32
1 Answers
Let's suppose that the indicator variables concerned (you say "dummy variables", but that's a terminology over-used given its disadvantages) are x1
... x7
. From that definition it is taken that their values are 1 or 0, except that values may also be missing. Then the logic for the summary you want is
gen xs = (x1 + x2 + x3 + x4 + x5 + x6 + x7) >= 2 if (x1 + x2 + x3 + x4 + x5 + x6 + x7) < .
That's not too difficult to type, given copy and paste to replicate the syntax for the sum. The if
qualifier segregates any observations with missing on any of the indicators, for which missing will be returned for the new variable. Such observations will be reported as having a total x1 + x2 + x3 + x4 + x5 + x6 + x7
that is missing. Missing is treated as arbitrarily large in Stata, and certainly as greater than 2, which explains why the simpler code
gen xs = (x1 + x2 + x3 + x4 + x5 + x6 + x7) >= 2
would bite you if missings were present.
If you want a more complicated rule, you may find yourself reaching for egen
functions rowtotal()
, rowmiss()
, and so forth. See the help for egen
.

- 35,529
- 6
- 31
- 47
-
The solution I used was gen variable1=0 gen variable1=1 if x1 + x2 + x3 == 1 where == 2 if I am interested in the variable that has 2, >=3 if greater than or = to 3, an so on. My apologies for the novice question. – Econometrics33 Jun 29 '15 at 20:26
-
Sorry, but your comment is not clear to me. The code wouldn't run as you post it. If the answer is not what you want, please edit the original question. – Nick Cox Jun 29 '15 at 20:28