1

I am using Python Dedupe for de-duplication for our MDM database, So far it works fine after sufficient training and a entity map table is formed which shows you the Cluster_id's, Canonical name and a score.

I'm stucked and not sure for a new record inserted in the database, how this new record can be merged with the existing clusters in the entity_map table. I could not find a function in the dedupe documentation also.

Running the entire process(creating blocking map,plural key and clustered dupes) again for the new records will be costly, so just looking for a less expensive solution to cluster the new records with the existing clusters in entity map table

min2bro
  • 4,509
  • 5
  • 29
  • 55
  • I don't suppose you got anywhere with this? Similar issues here... – Gavin Gilmour Nov 23 '18 at 07:57
  • @GavinGilmour i solved this issue myself, Do you need any help? – min2bro Nov 23 '18 at 08:15
  • Thanks for getting back, I'm going down the same track as you using the Gazeteer class so just wondered if you stuck with this solution in the end and it worked for you (or you pursued some alternative). Reassuring to know this is a good way of solving this use case.... – Gavin Gilmour Nov 23 '18 at 16:09

0 Answers0