I have a character variable that's long (up to 12,000 characters), and I would like to find a string within the variable that sounds like a certain word.
I'd also like to create a variable that equals one if the string is in the variable. Let's say, for argument's sake, the word that I'm trying to find is "Mary" (not case-sensitive). Here are four sample strings in a variable called "string" in a dataset called "question":
- Mary had a little lamb its fleece was white as snow
- Jack be nimble Jack be quick Jack jump over the candlestick
- I think you and I should marry each other
- I actually do not want to get married
The flag variable should = 1 for strings 1 and 3 (because Mary and marry).
Unfortunately, I don't think I can use this code:
DATA answer;
SET question;
IF FINDW(string, SOUNDEX("Mary")) ne 0 THEN flag=1;
ELSE flag=0;
RUN;
It doesn't work because SAS is trying to find the soundex code for "Mary" in the string (not the actual string "Mary"). Any thoughts on how to get around this?