First you don't to have to know french to help me as i will explain the grammar rules that i need to apply with spacy in python. I have a file (test.txt) with multiple phrases in french (about 5000), each one different one from another and a mail (textstr) which is different each time (a mail that our client send us). And for each mail i have to see if one of the phrases in the file is in the mail. I thought of using spacy's phrasematcher, but i have one problem: In each mail the sentences are conjugated, so i cannot use the default property of the phrasematcher (As it uses the verbatim token text and does not take into account the conjugation of verbs). So i first thought of using spacy's phrasematching with lemmas to resolve my problem as all conjugated verbs have the same lemma:
def treatemail(emailcontent):
nlp = spacy.load("fr_core_news_sm")
with open('test.txt','r',encoding="utf-8") as f:
phrases_list= f.readlines()
phrase_matcher = PhraseMatcher(nlp.vocab,attr="LEMMA")
patterns = [nlp(phrase.strip()) for phrase in phrases_list]
phrase_matcher.add('phrases', None, *patterns)
mail = nlp (emailcontent)
matched_phrases = phrase_matcher(mail)
for match_id, start, end in matched_phrases:
span = sentence[start:end]
print(span.text)
Which is fine for 85% of the phrases from the file, but for the remaining 15% it does not work as some of the verbs in french have reflexive pronouns (Pronouns that comes before a verb): me, te, se, nous, vous, se + verb and the equivalent m',t' and s' + verb, if the verb starts with a voyelle. (They essentially always agree with the subject they refer to.)
In the text file the phrases are written in the infinitive form, so if there is a reflexive pronoun in the phrase, it's written in its infinitive form (either se + verb or s' + verb starting with a voyelle, e.g.: "S'amuser" (to have fun), "se promener" (to take a walk). In the mail the verb is conjugated with its reflective pronoun (Je me promene (I take a walk)).
What i have to do is essentially let the phrasematcher take into account the reflexive pronouns. So here's my question: How can i do that? Should i make a custom component which checks if there's a reflexive pronoun in the email and change the text to its infinitive form or is there some other way?
Thank you very much!