Custom search in Dask

Asked Jul 03 '18 at 09:47

Active Jul 03 '18 at 09:47

Viewed 189 times

I have 1000 regex patterns which I have to search in each of the 9000 strings. Normal brute force method using pandas list took 25 min for the same task. I have used delayed function of dask to parallelize the entire function. It took 9 min to accomplish the task. I need to have more speedup. How can I leverage dask arrays or dask dataframe to do the task? Or is there any faster way to do it?

def match(string):
    for regex in regex_list:
        if re.search(regex, string):
           pass
[match(x) for x in string_list]

asked Jul 03 '18 at 09:47

ANKIT JHA

What are your regex patterns? Are they just searching for substrings, or more complex? – jpp Jul 03 '18 at 09:49
The regex patterns are like '\d' or '\w' and I am searching them in entire string. – ANKIT JHA Jul 03 '18 at 11:23

Custom search in Dask

0 Answers0