2

In PROC SQL, I need to select all rows where a column called "NAME" does not contain multiple values "abc", "cde" and "fbv" regardless of what comes before or after these values. So I did it like this:

SELECT * FROM A WHERE
  NAME NOT CONTAINS "abc" 
  AND
  NAME NOT CONTAINS "cde"
  AND
  NAME NOT CONTAINS "fbv";

which works just fine, but I imagine it would be a headache if we had a hundred of conditions. So my question is - can we accomplish this in a single statement in PROC SQL? I tried using this:

SELECT * FROM A WHERE 
  NOT CONTAINS(NAME, '"abc" AND "cde" AND "fbv"');

but this doesn't work in PROC SQL, I am getting the following error:

ERROR: Function CONTAINS could not be located.

I don't want to use LIKE.

puk789
  • 322
  • 2
  • 8
  • 28

3 Answers3

2

You could use regular expressions, I suppose.

data a;
input name $;
datalines;
xyabcde
xyzxyz
xycdeyz
xyzxyzxyz
fbvxyz
;;;;
run;

proc sql;

SELECT * FROM A WHERE
  NAME NOT CONTAINS "abc" 
  AND
  NAME NOT CONTAINS "cde"
  AND
  NAME NOT CONTAINS "fbv";


SELECT * FROM A WHERE
  NOT (PRXMATCH('~ABC|CDE|FBV~i',NAME));
quit;

You can't use CONTAINS that way, though.

Joe
  • 62,789
  • 6
  • 49
  • 67
  • Worked great, thanks! ~ means wildcard I am assuming? – puk789 Nov 11 '16 at 11:06
  • 1
    `~` is just a delimiter, you have to start and end a regex with any delimiter that's not used in the regex. Regular expressions match 'anywhere' unless you anchor them to start or end. – Joe Nov 11 '16 at 11:07
0

You can use NOT IN:

SELECT * FROM A WHERE 
  NAME NOT IN ('abc','cde','fbv');
Kumar_Vikas
  • 837
  • 7
  • 16
  • But I need it to exclude any record(s) containing e.g. 'abc' regardless of how many characters are before or after 'abc'. – puk789 Nov 11 '16 at 10:38
0

If the number of items is above reasonable number to build inside code, you can create a table (work.words below) to store the words and iterate over it to check occurrences:

data work.values;
input name $;
datalines;
xyabcde
xyzxyz
xycdeyz
xyzxyzxyz
fbvxyz
;
run;

data work.words;
length word $50;
input word $;
datalines;
abc
cde
fbv
;
run;


data output;
set values;

/* build a has of words */
length word $50;
if _n_ = 1 then do;
   /* this runs once only */
   call missing(word);
   declare hash words (dataset: 'work.words');
   words.defineKey('word');
   words.defineData('word');
   words.defineDone();
end;

/* iterate hash of words */
declare hiter iter('words'); 
rc = iter.first();
found = 0;
do while (rc=0); 
   if index(name, trim(word)) gt 0 then do; /* check if word present using INDEX function */
      found= 1;
      rc = 1;
   end;
   else rc = iter.next();
end;
if found = 0 then output; /* output only if no word found in name */
drop word rc found;
run;
vasja
  • 4,732
  • 13
  • 15