I'm trying to build sentence simplification algorithm based on Stanford CoreNLP. One of simplification I want to do - transform sentence with homogeneous parts of sentence to several sentences. E.g.
I love my mom, dad and sister. -> I love my mom. I love my dad. I love my sister.
First of all I build semantic graph for input sentence string
final Sentence parsed = new Sentence(sentence);
final SemanticGraph dependencies = parsed.dependencyGraph();
The dependency graph for this sentence is
-> love/VBP (root)
-> I/PRP (nsubj)
-> mom/NN (dobj)
-> my/PRP$ (nmod:poss)
-> ,/, (punct)
-> dad/NN (conj:and)
-> and/CC (cc)
-> sister/NN (conj:and)
-> dad/NN (dobj)
-> sister/NN (dobj)
Then I found dobj
edges in the graph and nsubj
for (SemanticGraphEdge edge : dependencies.edgeListSorted()) {
if (edge.getRelation().getShortName().startsWith("dobj")) {
modifiers.add(edge);
} else if (edge.getRelation().getShortName().startsWith("nsubj")) {
subj = edge;
}
}
So now I have 3 edges in modifiers
and nsubj
with I
word. And now my proble is how to split the semantic graph into 3 separate graphs.
Of course naive solution was just to build sentence base on subj and governor/dependent from dobj
edges, but I understand that it's a bad idea and won't work on more complicated examples.
for (final SemanticGraphEdge edge : modifiers) {
SemanticGraph semanticGraph = dependencies.makeSoftCopy();
final IndexedWord governor = edge.getGovernor();
final IndexedWord dependent = edge.getDependent();
final String governorTag = governor.backingLabel().tag().toLowerCase();
if (governorTag.startsWith("vb")) {
StringBuilder b = new StringBuilder(subj.getDependent().word());
b.append(" ")
.append(governor.word())
.append(" ")
.append(dependent.word())
.append(". ");
System.out.println(b);
}
}
Can anyone give me some advices? Maybe I missed something useful in coreNLP documentation? Thanks.