I'm trying to build ternary intra sentence relationships. One of the methods I'm considering is shortest path dependency algorithm with pos tag sequence , shortest path dependency sequence which will be used as features to a kernel based SVM. I'm not sure on how to formulate these features.
txt='Domestic revenues increased 14% to $680.8 million and were 77% of total revenues for the year ended December 31, 2015.'
doc = nlp(txt)
for token in doc:
print((token.head.text, token.text, token.dep_,token.pos_))
edges = []
for token in doc:
for child in token.children:
edges.append(('{0}'.format(token.lower_),
'{0}'.format(child.lower_)))
graph = nx.Graph(edges)
the shortest path between second token of domestic revenues and 2015 looks like this
shortest path length :7
shortest path: ['revenues', 'increased', 'were', 'for', 'year', 'ended', 'december', '2015']
How do I use this dependency graph as a feature sequence for ternary relationship ? ( Audit-nsubj-increased-quantmod-million--conj-were )
How do i use generalized pos tags for these entity relationships (Audit-verb-num-num).
Since the entities in question are compound Im ok for the model to classify last tokens of the entities as a ternany relationship like this: (revenues,million,2015)--> (Audit,value,data)