There are few things to note about your usage of UTL_MATCH:
- EDIT_DISTANCE_SIMILARITY : Returns an integer between 0 and 100, where 0 indicates no similarity at all and 100 indicates a perfect match.
- JARO_WINKLER_SIMILARITY : Returns an integer between 0 and 100, where 0 indicates no similarity at all and 100 indicates a perfect match but tries to take into account possible data entry errors.
ORDER BY UTL_MATCH.EDIT_DISTANCE_SIMILARITY('%operation Knee right%',diagnosisname) DESC
This is not going to give you correct result. Since, you are only considering the possible similarity, however, not considering the data entry errors. So, you must use JARO_WINKLER_SIMILARITY.
operation Knee right
You need to keep in mind the CASE of the input and the column values to compare. They must be in similar case for a correct match. You are passing the input in LOWERCASE, however, your column values are in INITCAP. Better convert both the column values and the input to a similar case.
Let's look at the below demonstration to understand:
SQL> WITH DATA AS(
2 SELECT 'Heart Operation' diagnosis_name, 'IH123' icd_code FROM dual UNION ALL
3 SELECT 'Knee Operation' diagnosis_name, 'IK123' icd_code FROM dual UNION ALL
4 SELECT 'Left Knee Operation' diagnosis_name, 'IKL123' icd_code FROM dual UNION ALL
5 SELECT 'Right Knee Operation' diagnosis_name, 'IKR123' icd_code FROM dual UNION ALL
6 SELECT 'Fever' diagnosis_name, 'IF123' icd_code FROM dual
7 )
8 SELECT t.*,
9 utl_match.edit_distance_similarity(upper(diagnosis_name),upper('operation Knee right')) eds,
10 UTL_MATCH.jaro_winkler_similarity (upper(diagnosis_name),upper('operation Knee right')) jws
11 FROM DATA t
12 ORDER BY jws DESC
13 /
DIAGNOSIS_NAME ICD_CO EDS JWS
-------------------- ------ ---------- ----------
Right Knee Operation IKR123 20 72
Knee Operation IK123 20 70
Heart Operation IH123 25 68
Left Knee Operation IKL123 25 64
Fever IF123 15 47
SQL>
So, you see how both are different from each other. jaro_winkler_similarity does a better job in identifying the data entry errors and giving the most close match. Based on that, simply choose the first row after sorting it in descending order:
SQL> WITH DATA AS(
2 SELECT 'Heart Operation' diagnosis_name, 'IH123' icd_code FROM dual UNION ALL
3 SELECT 'Knee Operation' diagnosis_name, 'IK123' icd_code FROM dual UNION ALL
4 SELECT 'Left Knee Operation' diagnosis_name, 'IKL123' icd_code FROM dual UNION ALL
5 SELECT 'Right Knee Operation' diagnosis_name, 'IKR123' icd_code FROM dual UNION ALL
6 SELECT 'Fever' diagnosis_name, 'IF123' icd_code FROM dual
7 )
8 SELECT diagnosis_name
9 FROM
10 (SELECT t.*,
11 utl_match.edit_distance_similarity(upper(diagnosis_name),upper('operation Knee right')) eds,
12 UTL_MATCH.jaro_winkler_similarity (upper(diagnosis_name),upper('operation Knee right')) jws
13 FROM DATA t
14 ORDER BY jws DESC
15 )
16 WHERE rownum = 1
17 /
DIAGNOSIS_NAME
--------------------
Right Knee Operation
SQL>