I have a big jsn list which contains a lot of string elements with possible duplicate values. I need to check each element for similarity and add duplicate list item keys in dubs list to remove these items from jsn list.
Because of size of jsn list i decided to use Threading in my code to speed up second for loop execution and waiting time
But Thread/Process is not working as i expected.
The code below with Thread inside changes nothing in performance and also dubs list is empty after Threads join is finished
I tried without success.join() but i still got empty dubs list and no change in performance.
The main problem -> dubs list is empty before starting deleting duplicates.
from threading import Thread
from multiprocessing import Process
from difflib import SequenceMatcher
# Searching for dublicates in array
def finddubs(jsn,dubs,a):
for b in range(len(jsn)):
if ((jsn[a] == jsn[b]) or (SequenceMatcher(None, jsn[a], jsn[b]).ratio() > 40)):
dubs.append(b) # add dublicate list element keys to dublicates array
# Start threading
threads = []
for a in range(len(jsn)):
t = Thread(target=finddubs, args=(jsn,dubs,a))
threads.append(t)
t.start()
for thr in threads:
thr.join()
# Delete duplicate list items
for d in dubs:
k = int(d)
del jsn[k]
Without threading code is working