2

I am trying to extract relations (triples) from sentences and have been trying to manually sift through the dependency parse from Stanford's CoreNLP and pull out subject - verb - object relations that way.

Problem is the moment you go beyond a simple sentence like "I am happy", appositive phrases, ccomp and xcomp compound verbs and conj conjunctions, finding relations becomes more complicated.

Example: "My teacher, Bob is a great teacher" (my teacher, is, great teacher) & (My teacher, is, Bob)

"My friends and I don't like running or jumping." (my friends, don't like, running) & (my friends, don't like, jumping) & (I, don't like, running) & (I, don't like, jumping)

Stanford's OpenIE really doesn't work well for these scenarios (it resolves the first example fairly well, but doesn't get any relations for the second).

My question is: Are there any open source libraries for Java that could perform this relation extraction - either directly from the text or from a dependency parse?

I did come accross: https://github.com/knowitall/ollie which looks very promising - however Ollie is strictly prohibited for commercial use and I need to be able to use the library for commercial use in the future.


Another idea: I'm not very familiar with machine learning techniques - but I was thinking, could I somehow pass a dependency parse of a sentence to some ML model training algorithm with my desired outputs as shown in the examples above and train a model that could do this relation extraction for me?

abagshaw
  • 6,162
  • 4
  • 38
  • 76
  • Please note that "Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it." Asking for a library to do this is off-topic here. However, turning a tree structure into a RDF-based representation isn't too difficult. If you've written some code to start this process, asking about it would be on topic. - see http://stackoverflow.com/questions/22831474/triple-extraction-from-a-sentance?rq=1 – fnl Mar 28 '16 at 09:09

1 Answers1

0

Given you already have (or can build) the dependency trees and need this in a commercial setting, I think it is easiest to manually encode the special-case rules to create your triplets, e.g., for handling conjunctions like the one in your example, and see if that is good enough for you [1].

Other than "Stanford OpenIE", there is many more web-scale Open Relation Extraction or Open Information Extraction (although I prefer the more precise term [Predicate] Triplet Extraction) research around [2,3], most notably ReVerb [4] itself (from the same KnowItAll roots as Ollie, BTW), but that is strictly non-commercial, too...

However, to quote from similar replies to similar questions: """Please note that "Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it." Asking for a library to do this is off-topic here."""

[1] http://www.nist.gov/tac/publications/2013/participant.papers/UWashington.TAC2013.proceedings.pdf

[2] http://www.cs.washington.edu/research/projects/aiweb/media/papers/etzioni-ijcai2011.pdf

[3] http://nlp.stanford.edu/pubs/2015angeli-openie.pdf

[4] http://reverb.cs.washington.edu/

fnl
  • 4,861
  • 4
  • 27
  • 32