I am trying to create dummy variables in Stata that are 1 if any of the variables dx1
through dx25
start with a specific string. I know that I can do this using something like the following but for all 25 dx
variables:
gen dummy=0
replace dummy=1 if substr(dx1,1,4)=="6542" | substr(dx2,1,4)=="6542"
I would then create other dummies equal to 1 if any of the dx
s start with these:
6542 6522 6696 6410 6411 6412 6630 218 6426 459 490 491 492 493 494 495 496 9971 250 2810 28249 05410 054 657 V27.2 V27.3 V27.4 V27.5 V27.6 V27.7
I have been trying to figure out a more efficient and elegant way to do this.
Data Structure example (I will keep it to dx1
through dx5
here for space reasons):
+---------------------------------------+
| dx1 dx2 dx3 dx4 dx5 |
|---------------------------------------|
1. | 65421 V270 |
2. | 65221 65801 64232 65951 64892 |
3. | 64511 V270 |
4. | 64781 V270 |
5. | 65571 66331 64891 340 V270 |
|---------------------------------------|
6. | 66401 67202 66331 V270 |
7. | 66411 V270 V1321 |
8. | 65571 V270 V5864 |
9. | 65421 V270 V252 |
10. | 64511 64231 66331 66401 V270 |
|---------------------------------------|
11. | 65651 66401 V270 |
12. | 650 V270 |
13. | 64881 66541 66331 V270 V161 |
14. | 66311 65971 V270 |
15. | 64781 V270 V1589 |
|---------------------------------------|
16. | 65571 66191 V270 |
17. | 64241 66401 V270 |
18. | 66031 65971 66071 V270 |
19. | 64841 66401 30520 V270 |
+---------------------------------------+