I have 1000 regex patterns which I have to search in each of the 9000 strings. Normal brute force method using pandas list took 25 min for the same task. I have used delayed function of dask to parallelize the entire function. It took 9 min to accomplish the task. I need to have more speedup. How can I leverage dask arrays or dask dataframe to do the task? Or is there any faster way to do it?
def match(string):
for regex in regex_list:
if re.search(regex, string):
pass
[match(x) for x in string_list]