2

I'm using the levelsof command to identify unique values of a variable and stick them into a macro. Then later on I'd like to use those values in the macro to select records from another dataset that I'll load.

What i have in mind is something along the following lines:

keep if inlist(variable, "`macrovariable'")

Does that work? And is there another more efficient option? I could do this easily in R (because vectors are easier to work with than macros), but this project requires Stata.


Clarification:

if I have a variable with three unique values, a, b and c, I want to store those in a macro variable so I can later take another dataset and select observations that match one of those values.

Normally can use the inlist function to do this manually, but I'd like to soft-code it so I can run the program with different sets of values. And I can't get the inlist function to work with macros.

aesir
  • 565
  • 2
  • 13
  • 23
  • Can you be more specific? Have you tried it? If so, does it work? If not, what's the problem you're having? – goric Oct 19 '12 at 18:14
  • Although commonly used, "unique" (meaning strictly, occurs once only) is not the best term here. I recommend "distinct". – Nick Cox Nov 05 '14 at 09:35

3 Answers3

3
* the source data
levelsof x, local( allx )
* make it -inlist-friendly
local allxcommas : subinstr local allx  " " ", ", all
* bring in the new data
use using blah.dta if inlist(x, `allxcommas')
StasK
  • 1,525
  • 10
  • 21
  • Is there a solution for situations where `allxcommas` list is too long and throws an error of 'expression too long'? – radek Dec 12 '14 at 14:51
  • Is it too long at the point of the `local` evaluation in the fourth line (which I edited to make more robust), or at the point of `inlist()` evaluation in the last line? I am afraid that the latter cannot be meaningfully overcome, sans splitting `allxcommas` into manageable chunks, and then `append`ing the results together. – StasK Dec 12 '14 at 14:54
  • Many thanks for an update. The chunk failed me when the list was getting to long and `browse` was complaining. Good idea with `use using` - will give it a go. – radek Dec 15 '14 at 09:29
0

I suspect your difficulty in using a macro generated by levelsof with inlist is that you forgot to use the separate(,) option. I also do not believe you can use the inlist function with keep if-- you will need to add the extra step of defining a new indicator.

In the example below I used the 1978 auto data and created a variable make_abb of vehicle manufacturers (or make) which took only a handful of distinct values ("Do" for Dodge, etc.).

I then used the levelsof command to generate a local macro of the manufacturers which had a vehicle model with a poor repair record (the variable rep78 is a categorical repair record variable where 1 is poor and 5 is good). The option separate(,) is what adds the commas into the macro and enables inlist to read it later on.

Finally, if I want to drop the manufacturers which did not have a poor repair record, I generate a dummy variable named "keep_me" and fill it in using the inlist function.

*load some data
sysuse auto 
*create some make categories by splitting the make and model string
gen make_abb=substr(make,1,2)
lab var make_abb "make abbreviation (string)"
*use levelsof with "local(macro_name)" and "separate(,)" options
levelsof make_abb if rep78<=2, separate(,) local(make_poor)
*generate a dummy using inlist and your levelsof macro from above
gen keep_me=1 if inlist(make_abb,`make_poor')
lab var keep_me "dummy of makes that had a bad repair record"
*now you can discard the rest of your data
keep if keep_me==1
  • "I also do not believe you can use the `inlist()` function with `keep if`": there is absolutely no intrinsic problem with that. Consider `sysuse auto, clear` followed by `keep if inlist(rep78, 1, 2, 3)`. What's the difficulty? – Nick Cox Nov 18 '15 at 17:17
-1

This seems to work for me.

* "using" data
clear
tempfile so
set obs 10
foreach v in list a b c d {
    generate `v' = runiform()
}
save `so'

* "master" data
clear
set obs 10
foreach v in list e f g h {
    generate `v' = runiform()
}

* merge
local tokeepusing a b
merge 1:1 _n using `so', keepusing(`tokeepusing')

Yields:

. list

     +------------------------------------------------------------------------------------------+
     |     list          e          f          g          h          a          b        _merge |
     |------------------------------------------------------------------------------------------|
  1. | .7767971   .5910658   .6107377   .7256517    .357592   .8953723   .0871481   matched (3) |
  2. |  .643114   .6305301   .6441092   .7770287   .5247816   .4854506   .3840067   matched (3) |
  3. | .3833295    .175099   .4530386   .5267127    .628081   .2273252   .0460549   matched (3) |
  4. | .0057233   .1090542   .1437526   .3133509    .604553   .9375801   .8091199   matched (3) |
  5. | .8772233   .6420991   .5403687   .1591801   .5742173   .8948932   .4121684   matched (3) |
     |------------------------------------------------------------------------------------------|
  6. | .6526399   .5137199    .933116   .5415702   .4313532   .8602547   .5049801   matched (3) |
  7. | .2033027   .8745837      .8609   .0087578   .9844069   .1909852   .3695011   matched (3) |
  8. | .6363281   .0064866   .6632325    .307236   .9544498   .6267227   .2908498   matched (3) |
  9. |  .366027   .4896181   .0955155   .4972361   .9161932   .7391482    .414847   matched (3) |
 10. | .8637221   .8478178   .5457179   .8971257   .9640535    .541567   .1966634   matched (3) |
     +------------------------------------------------------------------------------------------+

Does this answer your question? If not, please comment.

Richard Herron
  • 9,760
  • 12
  • 69
  • 116
  • I don't think this addresses the problem the OP asked about. – StasK Oct 21 '12 at 00:33
  • I'm not sure, either, @StasK. If OP really needs the `levelsof` feature to id observations in another data set, why not just `merge` on that variable? – Richard Herron Oct 21 '12 at 01:57
  • @StasK -- You're right, I'm answering a different question. But why not `merge`, then? – Richard Herron Oct 21 '12 at 02:05
  • `merge` requires a substantial overhead of comparing the values of the id variable and sorting the `using` data (the `master` data will consist of three observations here, and it is not a big deal to sort it). `use using if results of levelsof` will work faster, as it is just a filter and only requires one pass through the `using` data, without having to save a sorted version back to the disk. So I would prefer that solution over `merge` most of the time. The only caveat though is that `levelsof` is not particularly fast in large data sets. – StasK Oct 21 '12 at 03:01
  • @StasK -- I see. Good point. (Although I use `levelsof` mostly for looping and often find that I have too many levels and have to switch to an `egen`/`group()` solution.) Thanks for the pointers! – Richard Herron Oct 21 '12 at 13:21