-1

I have 7 items/variables in Stata that address the same survey question. These 7 items are each different weight control behaviors (diet, exercise, pills, etc.). I am trying to combine these variables to create a single weight control behavior dummy variable that is coded as yes (did engage in weight control) and no (did not engage in weight control).

The response options for each variable look something like this for a given weight control behavior

dieted
11438 0 not marked
2771 1 marked
16 6 refused
6508 7 legitimate skip
13 8 don’t know

Here is my code. I re-coded 6,7,8 for all 7 vars as missing:

tab1 h1gh30a-h1gh30g,m`
foreach X of varlist h1gh30a-h1gh30g {
    replace `X'=. if `X' > 1
}
egen wgt_control= rowmax(h1gh30a-h1gh30g)
ta wgt_control
gen wgt_control_new=wgt_control
replace wgt_control_new = 1 if wgt_control>0 & wgt_control!=. 
replace wgt_control_new= 0 if wgt_control <1   
ta wgt_control_new

I used rowmax() to combine all 7 items but my issue is that the response option 0 or No doesn't appear when I tabulate it. I only get those who responded yes=1.

Nick Cox
  • 35,529
  • 6
  • 31
  • 47

2 Answers2

2

Here is a suggestion with a reproducible example for what I think is the cleanest approach. I also included some unsolicited advice about survey data best practices

* Example generated by -dataex-. For more info, type help dataex
clear
input double(h1gh30a h1gh30b h1gh30c)
1 1 1
1 0 1
6 1 8
0 0 0
7 6 8
end

* Explicit coding is better, so if possible, which it is with 7 vars,
* create a local with the vars are explicitly listed
local wgt_controls h1gh30a h1gh30b h1gh30c

* Recode is a better command to use here. And do not destroy information,
* there is a survey data quality assurance difference between respondent 
* refusing to answer, not knowing or question skipped. You can replace this
* survey codes with these extended missing values that behaves like missing values
* but retain the differences in the survey codes
recode `wgt_controls' (6=.a) (7=.b) (8=.c)

* While rowmax() could be used, I think it seems like anymatch() fits
* what you are trying to do better
egen wgt_control = anymatch(`wgt_controls'), values(1) 
TheIceBear
  • 2,912
  • 9
  • 23
  • Thanks, I just tried this, but I am getting an error after recode command where the output says "too few variables specified" – Radhika Prasad May 18 '22 at 19:16
  • 1
    Did you first try by running my code example exactly as is before trying to adapt it to your data? – TheIceBear May 18 '22 at 19:42
  • The `local` has to be visible to the command that uses it. See https://journals.sagepub.com/doi/pdf/10.1177/1536867X20931028 – Nick Cox May 19 '22 at 00:07
  • Yes, all of this code needs to be run as a script in the do-file editor. This will not work if you run the code one line at the time from the command window – TheIceBear May 19 '22 at 11:38
  • Sorry for the delay. The data that I'm working with is stored in a cold room so I have limited access. Yes, I've been running the code on a do-file editor. But I think your right, i need to run your example as it is then adapt it to all 7 items. I probably rushed this step so it was throwing an error. I will try again tomorrow. – Radhika Prasad May 19 '22 at 16:09
  • Thanks for the help! Your example worked and I was able to adapt it. I'm just wondering what is the purpose of a "local" in this case? – Radhika Prasad May 23 '22 at 17:54
  • Read here: https://en.wikipedia.org/wiki/Don%27t_repeat_yourself. It reduces the source of errors, makes your code more succinct, makes it easier for you to update your code without forgetting to update in all locations etc. But your task is possible to solve without using the local. – TheIceBear May 23 '22 at 17:59
1

There is no minimal reproducible example here, so we can't reproduce the problem independently.

From your code, it seems that h1gh30a-h1gh30g are recoded so that all are 0, 1 or missing, so their maximum takes one of the same values.

gen wgt_control_new = wgt_control
replace wgt_control_new = 1 if wgt_control>0 & wgt_control!=. 
replace wgt_control_new= 0 if wgt_control <1   

seems to boil down to cloning the variable:

gen wgt_control_new = wgt_control 

In short, I can't see a reason in your code why you should never see 0 as a possible result.

EDIT

A minimal check on whether there are zeros that aren't showing up as they should might be

egen max = rowmax(h1gh30a-h1gh30g)  

list high30a-high30g if max == 0 
```
Nick Cox
  • 35,529
  • 6
  • 31
  • 47
  • my output after ta wgt_control_new only shows those who answered yes (1= 14,209 ), which is the same result after running rowmax(). I think the conditions I specify for replace commands could be the problem. However shouldn't the 0=No response appear after running rowmax()? – Radhika Prasad May 18 '22 at 19:03
  • 0 should be an observed maximum if and only if 0 is the only non-missing value in an observation. – Nick Cox May 18 '22 at 19:14
  • See now EDIT to the post. – Nick Cox May 19 '22 at 00:08
  • after checking for zeros, only 1,6,7,8 show up. In this case is there any way to retain the 0/NO response category, using an approach with rowmax()? – Radhika Prasad May 23 '22 at 18:05
  • I think you need rowmin() if you want to catch any zeros. Or anymatch() as suggested by @TheIceBear (and first written by me). – Nick Cox May 23 '22 at 18:27