My data is as follows. As you can see, the first entry is 'tim' which matches with tim.rand and timrook. Similarly, pankit090 matches with pankit001, pankit002, pankit003, pankit004, pankit005
I want the result to be like below
What I was able to achieve is
emailsdb = database['Names'].values.tolist()
list = []
for email in emailsdb :
newlookup = emailsdb.copy()
newlookup.remove(email)
result = process.extractBests(email, newlookup, score_cutoff=85, limit=50)
if len(result) > 0:
list.append(email)
list.append(result)
What I get is
['tim',
[('tim.rand', 90), ('timrook', 90)],
'tim.rand',
[('tim', 90)],
'pankit090',
[('pankit001', 89),
('pankit002', 89),
('pankit003', 89),
('pankit004', 89),
('pankit005', 89)],
'timrook',
[('tim', 90)],
'pankit001',
[('pankit090', 89),
('pankit002', 89),
('pankit003', 89),
('pankit004', 89),
('pankit005', 89)],
'pankit002',
[('pankit090', 89),
('pankit001', 89),
('pankit003', 89),
('pankit004', 89),
('pankit005', 89)],
...........
...........
The suggestion required is to reach the final result to be like above picture with 2 line items. The ones where fuzzywuzzy was able to find matching user names.
Also required is the count of distinct TID and distinct PID in the group.