Construct new variable from >3 categorical variables (+maintain column names) for mosaic plot in Stata

Question

My question is an extension of that found here: Construct new variable from given 5 categorical variables in Stata

I am an R user and I have been struggling to adjust to the Stata syntax. Also, I'm use to being able to Google for R documentation/examples online and haven't found as many resources for Stata so I've come here.

I have a data set where the rows represent individual people and the columns record various attributes of these people. There are 5 categorical variables (white, hispanic, black, asian, other) that have binary response data, 0 or 1 ("No" or "Yes"). I want to create a mosaic plot of race vs response data using the spineplots package. However, I believe I must first combine all 5 of the categorical variables into a categorical variable with 5 levels that maintains the labels (so I can see the response rate for each ethnicity.) I've been playing around with the egen function but haven't been able to get it to work. Any help would be appreciated.

Edit: Added a depiction of what my data looks like and what I want it to look like.

my data right now:

person_id,black,asian,white,hispanic,responded

1,0,0,1,0,0

2,1,0,0,0,0

3,1,0,0,0,1

4,0,1,0,0,1

5,0,1,0,0,1

6,0,1,0,0,0

7,0,0,1,0,1

8,0,0,0,1,1

what I want is to produce a table through the tabulate command to make the following:

respond, black, asian, white, hispanic
responded to survey |    20, 30, 25, 10, 15

did not respond     |    15, 20, 21, 23, 33

Please read the [Asking](http://stackoverflow.com/help/asking) section of the _Help Center_, specially http://stackoverflow.com/help/mcve. You're essentially asking people to guess what your problem is. We have no information on your data structure, what you've tried, and the problems you're facing. — Roberto Ferrer, Sep 12 '15 at 21:24
People with knowledge of Stata, willing to help, will not necessarily understand what you mean by "headers" and "data frames" (terms from R, I believe). Try using more _neutral_ terms or reading Stata's [User's Guide](http://www.stata.com/bookstore/users-guide/) (a must, anyway), in order to better communicate and increase your chances of getting a relevant answer. — Roberto Ferrer, Sep 12 '15 at 21:33

Brendan · Accepted Answer · 2015-09-14T12:44:21.547

It seems like you want a single indicator variable rather than multiple {0,1} dummies. The easiest way is probably with a loop; another option is to use cond() to generate a new indicator variable (note that you may want to catch respondents for whom all the race dummies are 0 in an 'other' group), label its values (and the values of responded), and then create your frequency table:

clear
input person_id black asian white hispanic responded
1 0 0 1 0 0
2 1 0 0 0 0
3 1 0 0 0 1
4 0 1 0 0 1
5 0 1 0 0 1
6 0 1 0 0 0
7 0 0 1 0 1
8 0 0 0 1 1
9 0 0 0 0 1
end

gen race = "other"
foreach v of varlist black asian white hispanic {
    replace race = "`v'" if `v' == 1
}

label define race2 1 "asian" 2 "black" 3 "hispanic" 4 "white" 99 "other"
gen race2:race2 = cond(black == 1, 1, ///
                cond(asian == 1, 2, ///
                cond(white == 1, 3, ///
                cond(hispanic == 1, 4, 99))))

label define responded 0 "did not respond" 1 "responded to survey"
label values responded responded
tab responded race

with the result

                    |                          race
          responded |     asian      black   hispanic      other      white |     Total
--------------------+-------------------------------------------------------+----------
    did not respond |         1          1          0          0          1 |         3 
responded to survey |         2          1          1          1          1 |         6 
--------------------+-------------------------------------------------------+----------
              Total |         3          2          1          1          2 |         9

tab responded race2 yields the same results with a different ordering (by the actual values of race2 rather than the alphabetical ordering of the value labels).

Construct new variable from >3 categorical variables (+maintain column names) for mosaic plot in Stata

1 Answers1