I want to run some multiprocessing module to run some phrase matching on documents in parallel. To do this I thought of creating phrase matching object in one process and then share among multiple processes by creating copy of the PhraseMatcher object. The code seems to be failing with out giving anykind of error. To make things easier I have tried this to demonstrate what I am trying to achieve
import copy
import spacy
from spacy.matcher import PhraseMatcher
nlp = spacy.load('en')
color_patterns = [nlp(text) for text in ('red', 'green', 'yellow')]
product_patterns = [nlp(text) for text in ('boots', 'coats', 'bag')]
material_patterns = [nlp(text) for text in ('silk', 'yellow fabric')]
matcher = PhraseMatcher(nlp.vocab)
matcher.add('COLOR', None, *color_patterns)
matcher.add('PRODUCT', None, *product_patterns)
matcher.add('MATERIAL', None, *material_patterns)
matcher2 = copy.deepcopy(matcher)
doc = nlp("yellow fabric")
matches = matcher2(doc)
for match_id, start, end in matches:
rule_id = nlp.vocab.strings[match_id] # get the unicode ID, i.e. 'COLOR'
span = doc[start : end] # get the matched slice of the doc
print(rule_id, span.text)
With the matcher2
object it not giving any output, but with matcher
object I am able to get the results.
COLOR yellow
MATERIAL yellow fabric
I am stuck at this for couple of days. Any help will be deeply appreciated.
Thank you.