1

Although I am not using Neo4j, and instead using TitanDB (IBM Graph), due to the fact that I am new to graph databases, I have modelled a basic news feed using the schema suggested in the Neo4j documentation, for now.

http://neo4j.com/docs/snapshot/cypher-cookbook-newsfeed.html

Having fully read all the documentation, I am aware of several key differences between the way these databases operate.

In the model described in the link, each of a users posts are stored as vertexes connected by edges to each other, forming a long list of status updates emanating out from each user vertex.

While this makes sense given Neo4j's capabalities I am aware that TitanDB has vertex-centric indexing abilities, described in detail here:

http://s3.thinkaurelius.com/docs/titan/1.0.0/indexes.html

Right now I am trying to ensure that querying for a given users feed is optimal, for a large graph with lots of users, and with lots of permanently kept posts or status updates. Therefore, I would like to avoid having to traverse all the posts, of all of a users friends, then finally order and limit them, just in order to get the first 15 items of a users feed.

As such, I am unsure if the model described in the Neo4j documentation is really the best one to use with TitanDB, so my question is as follows:

  • Is the model described in the Neo4j documentation optimal for fast news feed retrieval in TitanDB?
  • If so, what indexes would I need to create in order to retrieve a users feed optimally?
  • If not, Would I be better to connect each post vertex directly to the user who posted it, and use a vertex-centric index on the time property of each posted edge?

I'm really after some general advice on modelling, indexing and retrieving a basic newsfeed in Titan DB. Thanks in advance.

gordyr
  • 6,078
  • 14
  • 65
  • 123

1 Answers1

2

The basic schema doesn't seem like a bad approach, though it's difficult to make a good judgement based on this one use case.

The simplest approach to solving your indexing problem is probably to denormalize a bit - store the user id as a property on the post vertex and create and index on the [user, timestamp] pair.

Vertex centric indexes might help you, but not in the proposed model - you'd need to model post as an edge, node a vertex, which may make other traversals rather awkward. Furthermore, IBM Graph does not support vertex centric indexes as of its current release.

  • Thanks Benjamin - Just so that I'm clear, does your indexing suggestion work with the Neo4j docs model? Or does it require me to attach the post vertexes directly to the user vertexes via their own edges, instead of as a linked list, as I suggested? Also, would that be a 'Vertex' index, across two properties (userid in the post vertex, and timestamp in the post vertex), or an 'edge' index? (timestamp in the connecting edges). Sorry if the questions seem mundane, Graph databases are quite a change conceptually. – gordyr Oct 18 '16 at 18:37
  • Sorry, I meant the following in the first set of brackets: (userid in the user vertex, and timestamp in the post vertex) – gordyr Oct 18 '16 at 18:46
  • Hi gordyr - off-hand I'd recommend against the linked-list approach, and instead use direct edges. Regarding the index: assuming you included a userid in the Post vertex, it'd be an index on that element only. Indices can't span across multiple elements. – Benjamin Anderson Oct 25 '16 at 17:50
  • While we (the IBM Graph team) are huge fans of StackOverflow for sorts of questions you've been asking, @gordyr, we also have a public Slack for more informal discussion - happy to have you there, too! You can sign up [here](http://ibm-graph-slackinvite.mybluemix.net/). – Benjamin Anderson Oct 25 '16 at 21:52