0

I am trying to build a knowledge graph based on textual documents (unstructured data). Therefore my current approach is to extract triples from the data and send these over to a graph database, e.g. neo4j for further analyses. What I notice however is that in the construction of triples there are many, let's call them, 'conditional triples'. An example:

text = "Donald Trump was president-elect for the republican party since July 2016"

Provides the following 'interesting' triples:

(Donald Trump, was, president-elect)
(Donald Trump, was president-elect for, republican party)
(Donald Trump, was president-elect for republican party since, July 2016)

We thus need three 4 nodes:
1. Donald Trump
2. president-elect
2. republican party
2. July 2016

Those are the 4 nodes that might have interesting relations to other entities in the graph. However, my difficulty (or doubts), are with the relationships, these seem very specific and long.

I am not sure whether this actually is an issue, or whether it would be best practice to include such long relationships, such as was president-elect for republican party since.

I have considered looking into creating traversals like:

(Donald Trump)-[was]->(president-elect)-[for]->(republican party)-[since]->(July 2016)

This provides more 'simple' relationships, however this is either a unique traversal such that other president-elects are not related to this particular node, or if it is not a unique traversal, then other president-elects are related to this same node but then the for and since relationships can no longer be uniquely tracked to Donald Trump.

As a result I am now inclined to apply the longer relationships. My question therefore is: Is that a best-practice approach, or am I missing alternative solutions?

N Meibergen
  • 362
  • 2
  • 14

2 Answers2

2

Here is a possible data model:

(:Person {name:"Donald Trump"})-[:ACHIEVED {date:'2016-07-01'}]->(pos:Position)
(pos)-[:HAS_TITLE]->(:Title {name:"President Elect"})
(pos)-[:FOR_PARTY]->(:Party {name:"Republican"})

The Person, Title, and Party nodes are unique.

cybersam
  • 63,203
  • 6
  • 53
  • 76
  • Great suggestion indeed, relevant for this approach is the implementation of NER, at least for recognition of dates. – N Meibergen Jan 07 '20 at 06:16
0

how are you extracting those triplets . I would suggest using NER and POS for knowledge extraction from your data. Then based on the entities available you can design your graph

TheTeacher
  • 500
  • 2
  • 6