I have an email list in SAS dataset. I want to identify similar email address from the list. I am trying to implement COMPGED function across all the rows for email variable. I need to sort the list based on similar distance so that similar email address become neighbours. Can anybody help on this please ?
Asked
Active
Viewed 240 times
0
-
1What's your code at the moment? What's wrong with the result? – Sven R. Feb 07 '16 at 18:31
-
For this type of linkage you can try the options here, the solution from @friedegg is good in terms of compged and the reference to the the-link-king.com is a good option as well. https://communities.sas.com/t5/SAS-Procedures/Name-matching/m-p/82780/highlight/true#M23757 – Reeza Feb 07 '16 at 20:43
1 Answers
0
Do a self join in proc sql
, using the result of compged
as criteria for join condition :
Example :
proc sql ; create table similar_emails as select a.Email as EmailA, b.Email as EmailB from email_list a left join email_list b on compged(a.Email,b.Email) <= 200 order by a.Email ; quit ;

Chris J
- 7,549
- 2
- 25
- 25
-
But I have only one email list. Suppose I have n no. Of email ids. I have to compare 1st email id with rest (n-1) email ids, 2nd email id with rest (n-1) ids. – Arpan Mondal Feb 07 '16 at 20:07
-
Use a cross join instead of left join, sort on the score and add it the select statement as well. – Reeza Feb 07 '16 at 20:39
-
My example is based on a single list of emails. If you wish to exclude an email from matching to itself, give each row an ID in a preceding datastep, and add `and a.ID ^= b.ID` to the join condition. – Chris J Feb 08 '16 at 08:35
-
1