Spacy.io DependencyMatcher Isn't Grouping MatchIDs

Question

I have been working with Spacy.io DependencyMatcher and I find it very powerful. But, I do have a question that I couldn't figure out from the documentation. The matches results are a list of tuples for the same MatchID instead of getting one tuple for each match.

Examples. Here are the matches I am getting

[(7324372616739864093, [1, 5]), (7324372616739864093, [1, 6]), (7324372616739864093, [1, 7]), (7324372616739864093, [1, 9]), (7324372616739864093, [1, 10]), (7324372616739864093, [1, 11]), (7324372616739864093, [1, 13]), (7324372616739864093, [1, 15])]

But, I expect the matches to be

[(7324372616739864093, [1, 5, 6, 7, 9, 10, 11, 13, 15])

Here is the code. Can someone tell me what I am doing wrong?

matcher = DependencyMatcher(nlp.vocab)
pattern = [
  {
      "RIGHT_ID": "anchor_experience",      
      "RIGHT_ATTRS": {"LOWER": "experience", "POS": "NOUN"}  
  },
  {
      "LEFT_ID": "anchor_experience",
      "REL_OP": ">>",
      "RIGHT_ID": "skills",
      "RIGHT_ATTRS": {"POS": {"IN": ["NOUN", "PROPN","VERB"]}}
  },
  
    
]

matcher.add("EXPERIENCE", [pattern])
matches = None
matches = matcher(doc)
print(matches)

for match in matches:
    match_id, token_ids = match
    for i in range(len(token_ids)):
        print(pattern[i]["RIGHT_ID"] + ":", doc[token_ids[i]].text)

You should really add your sample text here. – polm23 Dec 31 '21 at 05:00 — polm23, Dec 31 '21 at 05:00

score 1 · Answer 1 · answered Dec 31 '21 at 05:00

In dependency matcher output, you get one token per dictionary in the input pattern. Thus you have two tokens per match, and you can get multiple matches per doc.

This is helpful for connecting the match results back to the pattern. For your pattern it's not ambiguous, but for more complex patterns it can be helpful.

Spacy.io DependencyMatcher Isn't Grouping MatchIDs

1 Answers1