Speed up using numba in python using SequenceMatcher

Asked Dec 31 '19 at 05:56

Active Dec 31 '19 at 05:56

Viewed 192 times

Experiencing error when attempting to speed up with numba. Any other ways to speed up? Note "a" and "b" are pandas dataframe. I also have a gtx1070ti, any ways of utilising the gpu as well?

from difflib import SequenceMatcher
import time

z = []
x = []

st = time.time()

@jit
def calc_dist(a, b):
    for i in a['name']:
        for j in (b['name']):
            cc = SequenceMatcher(None, i.lower(), j.lower()).ratio()
            if cc > 0.8:
                z.append(i)
                x.append(j)


calc_dist(a, b)

en = time.time()

print(en-st)

asked Dec 31 '19 at 05:56

Edward Liu

You can't pass a Pandas dataframe to a Numba function. You can't create a SequenceMatcher object from within a Numba function. – Thane Brooker Dec 31 '19 at 06:20
1

there is a [quick_ratio() & real_quick_ratio()](https://docs.python.org/3/library/difflib.html#difflib.SequenceMatcher.quick_ratio) which are faster but not that strict – Shijith Dec 31 '19 at 06:28
No? @Thane Brooker a['name'] is just a list. – Edward Liu Dec 31 '19 at 06:57
@EdwardLiu your comments say 'Note "a" and "b" are pandas dataframe.' You can pass lists to numba. For numba, it would be a fun project to rewrite SequenceMaker and yes you could offload this to GPU. But if you don't want to rewrite, then the easiest speedup would be to stick to Python and use multiprocessing rather than Numba. – Thane Brooker Dec 31 '19 at 12:15

Speed up using numba in python using SequenceMatcher

0 Answers0