2

I'm working on a nlp problem, given a sentence with two entities I need to generate boolean indicating for each word if it stands on the dependency path between those entities.

For example:

'A misty < e1 >ridge< /e1 > uprises from the < e2 >surge< /e2 >'

I want to iterate on each words and tell if it is on the dependency path between e1 and e2

Two important notes:

-If you try to help me (first thanks), don't bother considering the xml markup with < e1 > and < e2 >, I really am interested in how to find if a word is on the dependency path between any two given words with spaCy, I take care of which words by myself

-As I'm not a nlp expert, I'm kind of confused with the meaning of "on the dependency path" and I'm sorry if it is not clear enough (these are the words used by my tutor)

Thanks in advance

Valentin Macé
  • 1,150
  • 1
  • 10
  • 25

2 Answers2

3

So my solution was found using that post

There is an answer dedicated to spaCy

My implementation for finding the dependency path between two words in a given sentence:

import networkx as nx
import spacy
enter code here
doc = nlp("Ships carrying equipment for US troops are already waiting off the Turkish coast")
    
def shortest_dependency_path(doc, e1=None, e2=None):
    edges = []
    for token in doc:
        for child in token.children:
            edges.append(('{0}'.format(token),
                          '{0}'.format(child)))
    graph = nx.Graph(edges)
    try:
        shortest_path = nx.shortest_path(graph, source=e1, target=e2)
    except nx.NetworkXNoPath:
        shortest_path = []
    return shortest_path

print(shortest_dependency_path(doc,'Ships','troops'))

Output:

['Ships', 'carrying', 'for', 'troops']

What it actually does is to first build a non-oriented graph for the sentence where words are the nodes and dependencies between words are the edges and then find the shortest path between two nodes

For my needs, I just then check for each word if it's on the dependency path (shortest path) generated

Community
  • 1
  • 1
Valentin Macé
  • 1,150
  • 1
  • 10
  • 25
2

Dependency path is a way of describing how clauses are build within a sentence. SpaCy has a really good example in their docs here, with the sentence Apple is looking at buying U.K. startup for $1 billion.

Pardon my lack of good visualization here, but to work through your example:

A misty ridge uprises from the surge.

In spaCy, we follow their example to get the dependencies:

import spacy
nlp = spacy.load('en_core_web_lg')
doc = nlp("A misty ridge uprises from the surge.")
for chunk in doc.noun_chunks:
    print(chunk.text, chunk.root.text, chunk.root.dep_, chunk.root.head.text)

This will get the "clauses" which make up your sentence. Your output will look like so:

Text                  | root.text| root.dep_ | root.head.text
A misty ridge uprises   uprises    ROOT        uprises
the surge               surge      pobj        from

chunk.text is the text that makes up your dependency clause (note, there may be overlap depending on sentence structure). root.text gives the root (or head) of the dependency tree. The head of the tree is a spaCy token object, and has children that you can iterate through to check if another token is on the dependency tree.

def find_dependencies(doc, word_to_check=None, dep_choice=None):
    """
    word_to_check is the word you'd like to see on the dependency tree
    example, word_to_check="misty"

    dep_choice is the text of the item you'd like the dependency check
    to be against. Example, dep_choice='ridge'
    """
    tokens, texts = [], []

    for tok in doc:
        tokens.append(tok)
        texts.append(tok.text)

    # grabs the index/indices of the token that you are interested in
    indices = [i for i,text in enumerate(texts) if text==dep_choice]

    words_in_path = []

    for i in indices:

        reference = tokens[i]
        child_elements = [t.text for t in reference.get_children()]
        if word_to_check in child_elements:
            words_in_path.append((word_to_check, reference))

    return words_in_path

The code isn't the prettiest, but that's a way you could get a list of tuples containing the word you want to check versus the associated parent token. Hopefully that's helpful

EDIT:

In the interest of tailoring a bit more to your use case (and massively simplifying what my original answer looks like):

# This will give you 'word':<spaCy doc object> key value lookup capability
tokens_lookup = {tok.text:tok for tok in doc}

if "misty" in tokens_lookup.get("ridge").children:
    # Extra logic here
C.Nivs
  • 12,353
  • 2
  • 19
  • 44
  • First thank you for your answer, after trying your code and if I understood well 'find_dependencies' is returning a tuple only if the 'word_to_check' is a child of the 'dep_choice', and this tuple is made of these two arguments. However I do not understand the interest of this function regarding my problem, which could be done by something like `if(word_to_check in dep_choice.children) then ...` I might be missing something though but what I really try to do is, given two entities in a sentence and a word, return true (resp. false) if this word is on the dep path between these entities – Valentin Macé Jul 11 '18 at 12:28
  • Could be a symptom of me not entirely grasping what you want. If you are just looking for a boolean response, then `if(word_to_check in dep_choice.children) then ...` could very easily be all you need – C.Nivs Jul 11 '18 at 12:39
  • Edited my answer to include what I think you are looking for. Let me know if that's a bit closer to what you need – C.Nivs Jul 11 '18 at 13:13
  • To clarify what I really need, let's say that we chose two entities in my example sentence. What I need is to generate the dependency path between those entities and then, for each word in the sentence, check if the word is part of the dependency path. **So thanks because you helped me clarify that what i'm looking for is actually only the dep path between two words**, the rest being kind of trivial, I'll come back with my final solution – Valentin Macé Jul 11 '18 at 14:06