Query to aggregate rows based on strings similarity

Asked Mar 20 '23 at 21:15

Active Mar 20 '23 at 21:20

Viewed 40 times

I want to aggregate a dataset based on string field and how similar values are. For example, the following table will show some values:

ID	Name
0231	Ebrahim Talaq
45621	Ebrahm Talaq
32134	Ebrahim Talaq L.L.C
5431234	Martin Cole

The end result of the grouping should be like the following:

Name	Count
a value for Ebrahim Talaq, Ebrahm Talaq, Ebrahim Talaq L.L.C	3
Martin Cole	1

I could do a process where I do a join with the same table and calculate the distances between the strings (fuzzy match or Jaro-distance as examples) but that will be multi-step process.

Anyone have a better idea?

edited Mar 20 '23 at 21:20

asked Mar 20 '23 at 21:15

Ebrahim Talaq

1

First thing is to read this answer [Group by similarity of string on a single table](https://stackoverflow.com/questions/56536983/group-by-similarity-of-string-on-a-single-table). Just for practice I made [this fiddle](https://dbfiddle.uk/9nnvACJ7) which uses Jaro-Winkler, but grouping strings this way is not good idea. – Ponder Stibbons Mar 20 '23 at 23:34

Query to aggregate rows based on strings similarity

0 Answers0