For example, I have two lists of entities and a function that measures the distance between them. Let's say it's Name and Email. In the table below for each email I measured the distance to each Name.
1@ - {A:0.2, B:0.3, C:0.4, D:0.6}
2@ - {A:0.15, B:0.2, C:0.2, D:0.5}
3@ - {A:0.1, B:0.05, C:0.03, D:0.2}
Now I want to find single minimum-distance pair for each Email in Names. But, paying attention that if two Emails have same minimum-distance Name candidate, wins whoever has smallest distance. In this case other one Email should select second-closest Name candidate and check again.
So, in this case result should be:
1@: B
2@: A
3@: C
Table to explain:
emails/names | A | B | C | D |
---|---|---|---|---|
1@ | 0.2 | 0.3 | 0.4 | 0.6 |
2@ | 0.15 | 0.2 | 0.2 | 0.5 |
3@ | 0.1 | 0.05 | 0.03 | 0.2 |
Speed is important.. It could be processed in a form of dataframe or dicts, does not matter.
Thanks for any help.
UPD:
It's possible when the number of Emails > the number of Names, so some Emails will be unassigned, need also to catch them.