I have a table with thousands of rows.
Sample data:
user_id ZIP City email
105 100051 Lond. jsmith@hotmail.com
382 251574 jgjefferson@gmail.com
225 0100051 London john.smith@hotmail.com
I need to compare every user with the others, to be able to know which ones are similar.
In the example given, the user 105 and 225 are almost the same, so the expected result would be a column of a new id that matches the two of them, like this:
user_id ZIP City email new_id
105 100051 Lond. jsmith@hotmail.com 105
382 251574 jgjefferson@gmail.com 382
225 0100051 London john.smith@hotmail.com 105
How would I compare every field with the others, and know how to compare them, like clustering, for example?