I try to use the Spacy patterns in order to match the corresponding to differents surface shape of person in my text as: LASTNAME, FIRSTNAME
or/and FIRSTNAME, LASTNAME
and/or FIRSTNAME LASTNAME
(no punct).
I Try this:
import spacy
# create a nlp object with pretrained model
nlp = spacy.load('fr_core_news_lg')
ruler = nlp.add_pipe("entity_ruler", after='ner')
# define the patterns
patterns = [{"label":"PER", "pattern":[{"LOWER":"jules"},{"LOWER":"michelet"}]},
{"label":"PER", "pattern": [{"LOWER":"jean joseph"},{"LOWER":"laborde"}]},
]
specific_forms_patterns_persons = [
{"label": "PER", "pattern": [{"ENT_TYPE": "PER"}, {"IS_PUNCT": True}, {"ENT_TYPE": "PER"}]}
]
# add patterns to ruler
ruler.add_patterns(patterns)
ruler.add_patterns(specific_forms_patterns_persons)
# convert the input sentence into the document object using the 'nlp'
doc = nlp("Jules, Michelet avec Laborde, Jean Joseph et Jacques Mei à Paris.")
# print the entities in the sentenced after adding the EntityRuler matcher
print([(ent.text, ent.label_) for ent in doc.ents])
I get this output:
[('Jules', 'PER'), ('Michelet', 'PER'), ('Laborde', 'PER'), ('Jean Joseph', 'PER'), ('Jacques Mei', 'PER'), ('Paris', 'LOC')]
While I want to get:
[('Jules, Michelet', 'PER'), ('Laborde, Jean Joseph', 'PER'), ('Jacques Mei', 'PER'), ('Paris', 'LOC')]
I try to customize my pattern with :
specific_forms_patterns_persons = [
{"label": "PER", "pattern": [{"ENT_TYPE": "PER"}, {"ORTH": ","}, {"ENT_TYPE": "PER"}]}
]
but I still get the same output. It would be best to train a spacy model to recognize these specific shapes but I was wondering if this was possible only with rules.