1

So, im used to use jaro_winkler and else in order to match stings that our customers provides us and using those pcts to find the customers at our database since we dont really have a Key as most of the other places have, like SSN, SIN, CPF and stuffs like that, we cant ask the customer for this info, so... the thing is, im trying to use the same jaro_winkler function that im used to, but now, to let the customer able to input 1 typo wrong on their dates of birth, but, now comes the bad part, depending on the disposal of the wrong typo, it keeps changing the result, i had calculate 0.96 as expecting result once the customer places a wrong typo... but, as you can see bellow, depeding on where i change the typo, it keeps giving me different kind`s of results.

Changing the last typo at year from 60 to 61

SELECT UTL_MATCH.jaro_winkler ('12/10/1961','12/10/1960') FROM DUAL;

it gaves me the accurate rating that i was expecting which is 0.96

but using a different day, by one typo as well, as you can see bellow

SELECT UTL_MATCH.jaro_winkler ('11/10/1960','12/10/1960') FROM DUAL;

The pct`s drop ALOT and goes to 0.873333333333333

I`ve tried to combine alot of different kinds of changes regarding 1 wrong typo, and the result changes everytime, depending what is my set of data, the disposal and else, so, im wondering if there is a way to set the matching more ~static~ maybe, to just compare the whole sting and give me the result based on the whole sting and not based on the disposals and else

Plues, ive tried to used different kinds of masks, as YYYYMMDD and else, nothing works so far..

Ytipsh
  • 37
  • 2
  • 10
  • 2
    Jaro-Winkler seems like a poor algorithm. I'd guess that you'd want the `edit_distance` or `edit_distance_similarity` from `utl_match` instead. Jaro-Winkler gives priority to the leading characters of the string. – Justin Cave Sep 24 '15 at 17:47
  • @JustinCave Hi Justin, indeed i`ve already tried edit_distance as well, but the pct`s seems to be even more weird once i change 1 typo from different places, but tks for the head`s up, i do think Jaro is a poor Algorithm as well, thou – Ytipsh Sep 24 '15 at 17:50
  • I don't understand. The `edit_distance` between '12/10/1961' and either '12/10/1960', '12/11/1961', or '11/10/1961' is 1. That seems to be exactly what you are asking for. If that is not what you want, can you clarify your question? – Justin Cave Sep 24 '15 at 17:54
  • @JustinCave yeap, it`s true, but the distance between '12/09/1961' and '12/10/1961' is equal to 2, cos customers pick their month at a drop down list – Ytipsh Sep 24 '15 at 18:15
  • 3
    So you're not looking for string typo's at all then. So any string matching algorithm is a poor approach. You can take two `date` values (not strings), calculate the interval between them, then sum the absolute value difference of years, months, and days. – Justin Cave Sep 24 '15 at 18:32
  • @JustinCave Hey Justin, i've following your tip and btw, it look good thou, will let you know if it worked for me, tks alot btw – Ytipsh Sep 24 '15 at 21:49
  • @JustinCave after some adjustments, and some other things to consider at the design of the process, it worked like a charm, tks Justin for this tip! – Ytipsh Sep 25 '15 at 12:46

0 Answers0