I have two data sets. one is suppose the repair description
Electric Component keyboard replacement
The second data set is all the repair descriptions for all the customers who had previous repair phrase and later had some repair description. Eg:
Electric Keyboard replace
Monitor Component Replacement
Mouse component
Wire Replacement
PIN part
so for this example I would like it to pick "Electric Keyboard replace" from second set as the most similair phrase to "Electric Component keyboard replacement"
DATA NAME;
INFILE DATALINES DSD;
length FIRST $ 1000;
INPUT FIRST $;
DATALINES;
Electric Component keyboard replacement
;
DATA COMPONENT;
INFILE DATALINES DSD;
length FIRST_B $ 1000;
INPUT FIRST_B $;
DATALINES;
Electric Keyboard replace
Monitor Component Replacement
Mouse component
Wire Replacement
PIN part
;
PROC SQL;
CREATE TABLE Possible_Matches AS
SELECT *
FROM Name AS n, COMPONENT AS b
WHERE (n.FIRST =* b.FIRST_B);
QUIT;
It worked using sound like operator, I was excited. But When I tried this eg where I changed to "keyboard component replace" instead of "Electric Keyboard replace". It did not identify it gave me blank dataset. I tried "compare" too but was not able to achieve. I tried this approach as I saw some examples of names and email id correction or matching. But could the similair phrases be matched also using these functions? Is there any other solution to achieve this? Normally my matches will be rearranged words or extra words or shorter words(like replacement to replace)