-3

I want a query to find the possible duplicates in a row. Eg : Table A

Name
------------
1.Rajaraju.    
2.Rajuraja.     
3.Vijay. 
4.Ramkumar. 
5.Kumarram.  
6.Sakthi. 
7.Raj ram Ravi. 
8.Ravi Raj ram. 

Want a query to pick all the names which are similar i.e Rajaraju and Rajuraja same, similarly Raj ram Ravi and Ravi Raj ram.....

Littlefoot
  • 131,892
  • 15
  • 35
  • 57
Kumar
  • 1
  • 1
  • 1
    You and I know "Raja" and "Raju" are part of names.But, how are you planning to make Oracle understand that they ought to be compared by picking a part exactly after 4 letters of the entire text (Rajaraju) and not after 2 or 3 or 5? – Kaushik Nayak Mar 02 '19 at 08:25
  • Oracle offers various different matching algorithms, each with strengths and weaknesses. I have a (rather old) answer here which covers them. [Check it out](https://stackoverflow.com/a/8231225/146325) – APC Mar 02 '19 at 09:39

1 Answers1

0

UTL_MATCH.JARO_WINKLER_SIMILARITY might be one choice. Higher value means a better match.

For example, I sorted the result on similarity descending and showed only several rows. You should decide which similarity value satisfies your needs and apply another condition, e.g. where sim >= 80.

SQL> with test (name) as
  2    (select '1.Rajaraju.' from dual union all
  3     select '2.Rajuraja.' from dual union all
  4     select '3.Vijay.' from dual union all
  5     select '4.Ramkumar.' from dual union all
  6     select '5.Kumarram.' from dual union all
  7     select '6.Sakthi.' from dual union all
  8     select '7.Raj ram Ravi.' from dual union all
  9     select '8.Ravi Raj ram.' from dual
 10    ),
 11  -- remove leading numbers and dots
 12  inter as
 13    (select translate(t.name, 'x.0123456789', 'x') name
 14     from test t
 15    )
 16  -- find similarity
 17  select a.name,
 18         b.name,
 19        utl_match.jaro_winkler_similarity(a.name, b.name) sim
 20  from inter a cross join inter b
 21  where a.name < b.name
 22  order by 3 desc;

NAME                 NAME                        SIM
-------------------- -------------------- ----------
Rajaraju             Rajuraja                     87
Raj ram Ravi         Rajuraja                     82
Raj ram Ravi         Ravi Raj ram                 80
Raj ram Ravi         Rajaraju                     78
Rajuraja             Ramkumar                     74
Rajaraju             Ramkumar                     74
Kumarram             Ramkumar                     72
Rajaraju             Ravi Raj ram                 71
Rajuraja             Ravi Raj ram                 71
<snip>
Littlefoot
  • 131,892
  • 15
  • 35
  • 57