I'm looking for a way to search through a database and find close similarities between email addresses. The only solution I can thing of is O(N^2), and involves a nested loop. Basically grab an email address, and then check it against the rest of the addresses, over and over. This will be extremely consuming as I'm dealing with 100,000 email addresses in a database. If it makes a difference, this will be implemented as a background job for a Ruby on Rails app.
Is there any way to do this?
I'm really only looking for basic similarities. An example would be
docjohnson@gmail.com
docjohnson1@gmail.com
docjohnson333@gmail.com
docjohnson@hotmail.com
I would want those all marked similar to each other.
Thanks for the help!
EDIT: I'm using a Mongo database connected to ROR via Mongoid, if that changes the game at all.