0

I have 1000 regex patterns which I have to search in each of the 9000 strings. Normal brute force method using pandas list took 25 min for the same task. I have used delayed function of dask to parallelize the entire function. It took 9 min to accomplish the task. I need to have more speedup. How can I leverage dask arrays or dask dataframe to do the task? Or is there any faster way to do it?

def match(string):
    for regex in regex_list:
        if re.search(regex, string):
           pass
[match(x) for x in string_list]
ANKIT JHA
  • 359
  • 1
  • 3
  • 9

0 Answers0