1

Does any one know how to generate the possible misspelling ?

Example : unemployment - uemployment - onemploymnet -- etc.

iamjeannie
  • 61
  • 2
  • 5
  • 2
    What is the definition of a misspelling? Is "employment" a variation of "unemployment"? Without a definition, the set can be infinite. – Chris Pitman Apr 12 '11 at 21:49
  • Dude. Seriously, you should accept the occasional answer now and again. 0% is horrible. – riwalk Apr 12 '11 at 21:59
  • I'm looking for the code that could help me generate something that help suppress customer that misspelling the employer name : – iamjeannie Apr 12 '11 at 23:03
  • You should return to the questions you have asked and accept answers (I believe it is a checkmark you click next to the answer). – Chris Pitman Apr 13 '11 at 21:56

3 Answers3

3

If you just want to generate a list of possible misspellings, you might try a tool like this one. Otherwise, in SAS you might be able to use a function like COMPGED to compute a measure of the similarity between the string someone entered, and the one you wanted them to type. If the two are "close enough" by your standard, replace their text with the one you wanted.

Here is an example that computes the Generalized Edit Distance between "unemployment" and a variety of plausible mispellings.

data misspell;
  input misspell $16.;
  length misspell string $16.;
  retain string "unemployment";
  GED=compged(misspell, string,'iL');
datalines;
nemployment
uemployment
unmployment
uneployment
unemloyment
unempoyment
unemplyment
unemploment
unemployent
unemploymnt
unemploymet
unemploymen
unemploymenyt
unemploymenty
unemploymenht
unemploymenth
unemploymengt
unemploymentg
unemploymenft
unemploymentf
blahblah
;
proc print data=misspell label;
   label GED='Generalized Edit Distance';
   var misspell string GED;
run;
cmjohns
  • 4,465
  • 17
  • 21
  • Thanks John, I was actually looking for all that possible values as the sas output. I should have been more clear.. So I am looking for a sas code that would give me output : nemployment uemployment unmployment uneployment unemloyment unempoyment unemplyment unemploment unemployent unemploymnt unemploymet unemploymen unemploymenyt unemploymenty unemploymenht unemploymenth unemploymengt unemploymentg unemploymenft unemploymentf .. p – iamjeannie Apr 13 '11 at 14:25
  • I don't think you really want to be generating all possible mis-spellings and then matching what was provided with that list, but rather using COMPGED as suggested here to match what was provided against the expected list and then saying it's a match if the spelling distance is suitably low. – Tom Quarendon Apr 19 '11 at 16:26
1

Essentially you are trying to develop a list of text strings based on some rule of thumb, such as one letter is missing from the word, that a letter is misplaced into the wrong spot, that one letter was mistyped, etc. The problem is that these rules have to be explicitly defined before you can write the code, in SAS or any other language (this is what Chris was referring to). If your requirement is reduced to this one-wrong-letter scenario then this might be managable; otherwise, the commenters are correct and you can easily create massive lists of incorrect spellings (after all, all combinations except "unemployment" constitute a misspelling of that word).

Having said that, there are many ways in SAS to accomplish this text manipulation (rx functions, some combination of other text-string functions, macros); however, there are probably better ways to accomplish this. I would suggest an external Perl process to generate a text file that can be read into SAS, but other programmers might have better alternatives.

RWill
  • 939
  • 5
  • 6
0

If you are looking for a general spell checker, SAS does have proc spell.

It will take some tweaking to get it working for your situation; it's very old and clunky. It doesn't work well in this case, but you may have better results if you try and use another dictionary? A Google search will show other examples.

filename name temp lrecl=256;
options caps;

data _null_;
  file name;
  informat name $256.;
  input name &;
  put name;
  cards;
uemployment 
onemploymnet 
;

proc spell in=name
  dictionary=SASHELP.BASE.NAMES
  suggest;
run;

options nocaps;
richie
  • 1
  • Thank you richie, But I should have been more clear.. So I am looking for a sas code that would give me output : nemployment uemployment unmployment uneployment unemloyment unempoyment unemplyment unemploment unemployent unemploymnt unemploymet unemploymen unemploymenyt unemploymenty unemploymenht unemploymenth unemploymengt unemploymentg unemploymenft unemploymentf .. – iamjeannie Apr 13 '11 at 14:22