-1

My data is more than 70,000. I have more than 50 variables. (Var1 to Var50). In each variable, there are about about 30 groups (I'll use a to z). I am trying to get a selection of data using if statements. I'd like to select every data with the same group. Eg data in var 1 to 30 with a, data with var 1 to 30 in b. I seem to be writing If (Var1="a" and Var2="a" and Var3="a" and Var4="a" and all the way to var50= "a") or (Var1="b" and Var2="a" and Var3="b" and Var4="b" and all the way to var50= "b")... How do I consolidate? I tried using an array but it didnt work and i was not sure if arrays work in the IF and then statement.

    IF (VAR2="A" or VAR2="B" or VAR2="C" or VAR2="D" 
    or VAR3="A" or VAR3="B" or VAR3="C" or VAR3="D"
 or VAR4="A" or VAR4="B" or VAR4="C" or VAR4="D" 
 or VAR5="A" or VAR5="B" or VAR5="C" or VAR5="D"
 or VAR6="A" or VAR6="B" or VAR6="C" or VAR6="D"
 or VAR7="A" or VAR7="B" or VAR7="C" or VAR7="D" 
 or VAR8="A" or VAR8="B" or VAR8="C" or VAR8="C" 
 or VAR9="A" or VAR9="B" or VAR9="C" or VAR9="D" 
 or VAR10="A" or VAR10="B" or I10_D10="C" or VAR10="D"

 or VAR12="A" or VAR12="B" or VAR12="C" or VAR12="D" 
 or VAR13="A" or VAR13="B" or VAR13="C" or VAR13="D"
 or VAR14="A" or VAR14="B" or VAR14="C" or VAR14="D" 
 or VAR15="A" or VAR15="B" or VAR15="C" or VAR15="D"
 or VAR6="A" or VAR16="B" or VAR16="C" or VAR16="D"
 or VAR17="A" or VAR17="B" or VAR17="C" or VAR17="D" 
 or VAR18="A" or VAR18="B" or VAR18="C" or VAR18="C" 
 or VAR19="A" or VAR19="B" or VAR19="C" or I10_D19="D" 
 or VAR20="A" or VAR20="B" or I10_D20="C" or VAR20="D" 
or VAR21="D" or VAR22="A" or VAR22="B" or VAR22="C" or VAR22="D" 
 or VAR23="A" or VAR23="B" or VAR23="C" or VAR23="D"
 or VAR24="A" or VAR24="B" or VAR24="C" or VAR24="D" 
 or VAR25="A" or VAR25="B" or VAR25="C" or VAR25="D"
 or VAR26="A" or VAR26="B" or VAR26="C" or VAR26="D"
 or VAR27="A" or VAR27="B" or VAR27="C" or VAR27="D" 
 or VAR28="A" or VAR28="B" or VAR28="C" or VAR28="C" 
 or VAR29="A" or VAR29="B" or VAR29="C" or VAR29="D" 
 or VAR30="A" or VAR30="B" or I10_D30="C" or VAR30="D") 
 then Group=1; else Group=0;
  • I'm confused what you are trying to test. what is your desired output? Can you create a workable example with the input and desired output? – DomPazz Aug 30 '18 at 18:27
  • 1
    Can you show example input and output for a simple case with just 2 or three variables and a handful of observations? – Tom Aug 31 '18 at 01:37
  • This is likely an XY situation. It would help a lot if you would provide some sample input and expected output and what you're trying to do. It looks like dummy variables so I would consider GLMMOD to create the dummies. Or you could use WHICHC to at least reduce the comparisons or groups. I'll provide code once sample data and output is shown. – Reeza Sep 05 '18 at 20:00

4 Answers4

0

You probably don't need a macro, however a macro might be faster.

%let value=a;

data want;
set have;
array var[50];
keepit=1;
do i=1 to 50;
   keepit = keepit and (var[i]="&value");
   if ^keepit then
      leave;
end;
if keepit;
drop i keepit;
run;

I create a signal variable and update it's value, it will be false if any value in the var[] array is not the &value. I leave the loop early if we find 1 non-matching value, to make it more efficient.

DomPazz
  • 12,415
  • 17
  • 23
  • you have to check all the variables and need a way to follow if all the variables match. You cannot do something like `do i=1 to 50; if var[1]="a"; end;`. You have to maintain state throughout the loop. – DomPazz Aug 30 '18 at 17:58
  • I added the codes that I tried. can you look at it. – confused programmer Aug 30 '18 at 18:12
  • What are you trying to test? You definitely cannot use a `where` clause like you have it. – DomPazz Aug 30 '18 at 18:24
  • I am trying to put in conditions (many of them) to analyze some data – confused programmer Aug 30 '18 at 18:35
  • then alter the condition in my example to meet your needs. – DomPazz Aug 30 '18 at 18:40
  • For this type of testing I would (and have previously) generate a function style macro that takes two lists as input and generates the long series of `var='value'` comparisons joined by `OR` as in the original post. Then you could use the macro call in a SAS statement.`group=%dxtest(var1 var2,'A' 'B');` – Tom Sep 06 '18 at 14:02
0

It's not exactly clear what you want. If you want to avoid checking all variables you can use WHICHC to find if any in a list are A.

X = whichc('a', of var1-var30);

If you want to see what different groups you have across all the variables, I think a big proc freq is what you want:

 proc freq data=have noprint;
   table var1*var2*var3*var4....*var30*gender*age / list out=table_counts;
 run;

And then check the table_counts data set to see if that has what you want.

If neither of these are what you want, you need to add more details to your question. A sample of data and expected output would be perfect.

Reeza
  • 20,510
  • 4
  • 21
  • 38
0

When I need to search several variables for a particular value what I will do is - combine all variables into one string and then search that string. Like this:

*** CREATE TEST DATA ***;
data have;
    infile cards;
    input VAR1 $ VAR2 $ VAR3 $ VAR4 $ VAR5 $;
    cards;
J J K A M
S U I O P
D D D D D
l m n o a
Q U J C S
;
run;

data want;
    set have;

    *** USE CATS FUNCTION TO CONCATENATE ALL VAR# INTO ONE VARIABLE ***; 
    allvar = cats(var1, var2, var3, var4, var5); 

    *** IF NEEDED, APPLY UPCASE TO CONCATENATED VARIABLE ***;
    *allvar = upcase(allvar);

    *** USE INDEXC FUNCTION TO SEARCH NEW VARIABLE ***;
    if indexc(allvar, "ABCD") > 0 then group = 1;
    else group = 0;
run;

I'm not sure if this is exactly what you need, but hopefully this is something you can modify for your particular task.

SunnyRJ
  • 383
  • 1
  • 7
0

The code as posted is testing if ANY of a list of variables match ANY of a list of values.

Let's make a simple test dataset.

data have ;
  input id (var1-var5) ($);
cards;
1 E F G H I
2 A B C D E
;;;;

Make one array of the values you want to find and one array of the variables you want to check. Loop over the list of variables until you either find one that contains one of the values or you run out of variables to test.

data want ;
  set have;
  array values (4) $8 _temporary_ ('A' 'B' 'C' 'D');
  array vars var1-var5 ;
  group=0;
  do i=1 to dim(vars) until (group=1);
    if vars(i) in values then group=1;
  end;
  drop i;
run;

You could avoid the array for the list of values if you want.

  if vars(i) in ('A' 'B' 'C' 'D') then group=1;

But using the array will allow you to make the loop run over the list of values instead of the list of variables.

  do i=1 to dim(values) until (group=1);
    if values(i) in vars then group=1;
  end;

Which might be important if you wanted to keep the variable i to indicate which value (or variable) was first matched.

Tom
  • 47,574
  • 2
  • 16
  • 29