0

I want to aggregate a dataset based on string field and how similar values are. For example, the following table will show some values:

ID Name
0231 Ebrahim Talaq
45621 Ebrahm Talaq
32134 Ebrahim Talaq L.L.C
5431234 Martin Cole

The end result of the grouping should be like the following:

Name Count
a value for Ebrahim Talaq, Ebrahm Talaq, Ebrahim Talaq L.L.C 3
Martin Cole 1

I could do a process where I do a join with the same table and calculate the distances between the strings (fuzzy match or Jaro-distance as examples) but that will be multi-step process.

Anyone have a better idea?

Ebrahim Talaq
  • 169
  • 1
  • 11
  • 1
    First thing is to read this answer [Group by similarity of string on a single table](https://stackoverflow.com/questions/56536983/group-by-similarity-of-string-on-a-single-table). Just for practice I made [this fiddle](https://dbfiddle.uk/9nnvACJ7) which uses Jaro-Winkler, but grouping strings this way is not good idea. – Ponder Stibbons Mar 20 '23 at 23:34

0 Answers0