How to relate word concordances and docuements using graph data base

Question

I saw some tutorials referring to importing data to the {Neo4j, titan,...} graph DBs. I have some questions about how to apply it to our academic project.

In this case we have a set of documents in pdf and Microsoft Word formats (which is not so important at the moment) which we want to relate by global topic. In that case, it is easy to visualize the solution by using {Neo4j, titan,...}; however, there exists another case which I think is a little more special. Some concepts that are present in the documents must be related in a graph. In this case we have pinpointed such concepts, represented almost always by multiword terms (like for instance "artificial neural netwok"), and we would like to relate them within a document as a set of documents, with the possibility of making a query about any context in the document or documents in which any queried term is used (concordances).

The last task (concordances) currently is made by using a relational DB, but we want to centralize all tasks into a unique database, which we think must be a graph DB {Neo4j, titan,...}.

We hope a little orientation about how you think we can adapt our problem to the {Neo4j, titan,...} graph DB. Maybe documentation about similar examples (if there exists), a general view about possible ways of the data structure for importing... you know.

Well I hope not to be so ambiguous. Thank you so much in advance.

score 0 · Answer 1 · answered Jun 30 '14 at 09:47

The question is still a bit broad, but I'll try to give a quick answer for what I've understood.

You can start in an easy way for the "schema" (or structure of the graph):

Create a node for a multiword term
Create a node for a document
Everytime there's a concordance you can create a link between the two respective ends (term, document).

I would recommend to normalize all your data in a single format before proceeding with the processing/importing: a generic format is usually CSV, but you can also have a look at GraphML (which is widely suported by graphDBs) or GraphSON.

If you want to browse some "schema" for graphs, have a look at the Neo4J gist collection: it's a wide collection for various topics you can use to inspire your structure.

Hi Marco, thank you for your answer. I have still some doubts. It is possible to put documents inside Neo4j? or I need an other RI-oriented DB to manage documents and Neo4j only to relate them (and desired contents in different levels). Actually Im not DBs-expert, sorry... — Nacho, Jun 30 '14 at 17:35
I see. Perhaps you can have a look then to OrientDB, which is a mixed Document-GraphDB: http://www.orientechnologies.com/orientdb/ — MarcoL, Jun 30 '14 at 17:46

How to relate word concordances and docuements using graph data base

1 Answers1