I have a list of 'ids':
ids = [None, '20160928a', '20160929a', ... ]
and another list of certain 'ids' that I found were duplicate ids using fuzzywuzzy:
repeat_offenders = ['20160928a', '20161115a', '20161121a', ... ]
I would like to use fuzzywuzzy again to create a list of lists that contains lists of where (by index) the duplicate ids are located within the list 'ids'. So the output would look something like this (and because they are duplicates each list within the list would contain at least two elements:
collected_ids = [[0,5,700], [6,3], [4,826,12]]
My attempt, which currently only returns the ids not the location of the id:
collected_urls = []
for offender in repeat_offenders[:10]:
best_match = process.extract(offender, ids)
collection = []
for match in best_match:
if match[1] > 95:
collection.append(match[0])
else:
pass
collected_urls.append(collection)
Update, my attempt at using Moe's answer to find/group exact matches:
idz = ids
collected_ids = []
for i in range(len(idz)):
tmp = [i]
for j in range(len(ids)):
if idz[i] == idz[j] and i != j:
tmp.append(j)
del j
if len(tmp) > 1:
collected_ids.append(tmp)
del i