Group Similar usernames together

Question

My data is as follows. As you can see, the first entry is 'tim' which matches with tim.rand and timrook. Similarly, pankit090 matches with pankit001, pankit002, pankit003, pankit004, pankit005

I want the result to be like below

What I was able to achieve is

emailsdb = database['Names'].values.tolist()
list = []
for email in emailsdb :
    newlookup = emailsdb.copy()
    newlookup.remove(email)
    result = process.extractBests(email, newlookup, score_cutoff=85, limit=50)
    if len(result) > 0: 
        list.append(email)
        list.append(result)

What I get is

['tim',
 [('tim.rand', 90), ('timrook', 90)],
 'tim.rand',
 [('tim', 90)],
 'pankit090',
 [('pankit001', 89),
  ('pankit002', 89),
  ('pankit003', 89),
  ('pankit004', 89),
  ('pankit005', 89)],
 'timrook',
 [('tim', 90)],
 'pankit001',
 [('pankit090', 89),
  ('pankit002', 89),
  ('pankit003', 89),
  ('pankit004', 89),
  ('pankit005', 89)],
 'pankit002',
 [('pankit090', 89),
  ('pankit001', 89),
  ('pankit003', 89),
  ('pankit004', 89),
  ('pankit005', 89)],
...........
...........

The suggestion required is to reach the final result to be like above picture with 2 line items. The ones where fuzzywuzzy was able to find matching user names.

Also required is the count of distinct TID and distinct PID in the group.

Group Similar usernames together

0 Answers0