9

I would like to represent the changing strength of relationships between nodes in a Neo4j graph.

For a static graph, this is easily done by setting a "strength" property on the relationship:

  A --knows--> B
       |
     strength
       |
       3

However, for a graph that needs updating over time, there is a problem, since incrementing the value of the property can't be done atomically (via the REST interface) since a read-before-write is required. Incrementing (rather than merely updating) is necessary if the graph is being updated in response to incoming streamed data.

I would need to either ensure that only one REST client reads and writes at once (external synchronization), or stick to only the embedded API so I can use the built-in transactions. This may be workable but seems awkward.

One other solution might be to record multiple relationships, without any properties, so that the "strength" is actually the count of relationships, i.e.

A knows B
A knows B
A knows B

means a relationship of strength 3.

  • Disadvantage: only integer strengths can be recorded
  • Advantage: no read-before-write is required
  • Disadvantage: (probably) more storage required
  • Disadvantage: (probably) much slower to extract the value since multiple relationships must be extracted and counted

Has anyone tried this approach, and is it likely to run into performance issues, particularly when reading?

Is there a better way to model this?

DNA
  • 42,007
  • 12
  • 107
  • 146

3 Answers3

5

Nice idea. To reduce storage and multi-reads those relationships could be aggregated to one in a batch job which runs transactionally.

Each rel could also carry an individual weight value, whose aggregated value is used as weight. It doesn't have to be integer based and could also be negative to represent decrements.

You could also write a small server-extension for updating a weight value on a single relationship transactionally. Would probably even make sense for the REST API (as addition to the "set single value" operation have a modify single value operation.

PUT http://localhost:7474/db/data/node/15/properties/mod/foo 

The body contains the delta value (1.5, -10). Another idea would be to replace the mode keyword by the actual operation.

PUT http://localhost:7474/db/data/node/15/properties/add/foo 
PUT http://localhost:7474/db/data/node/15/properties/or/foo 
PUT http://localhost:7474/db/data/node/15/properties/concat/foo 

What would "increment" mean in a non integer case?

Michael Hunger
  • 41,339
  • 3
  • 57
  • 80
  • Thanks - several interesting possibilities there! After checking the dictionary, I think it is OK to talk about non-integer 'increments' (though clearly you'd have to specify the amount)! – DNA Dec 13 '11 at 21:56
2

Hmm a bit of a different approach, but you could consider using a queuing system. I'm using the Neo4j REST interface as well and am looking into storing a constantly changing relationship strength. The project is in Rails and using Resque. Whenever an update to the Neo4j database is required it's thrown in a Resque queue to be completed by a worker. I only have one worker working on the Neo4j Resque queue so it never tries to perform more than one Neo4j update at once.

This has the added benefit of not making the user wait for the neo4j updates when they perform an action that triggers an update. However, it is only a viable solution if you don't need to use/display the Neo4j updates instantly (though depending on the speed of your worker and the size of your queue, it should only take a few seconds).

Marc
  • 3,812
  • 8
  • 37
  • 61
1

Depends a bit on what read and write load you are targeting. How big is the total graph going to be?

Peter Neubauer
  • 6,311
  • 1
  • 21
  • 24
  • At a rough guess, I'd say a few tens of millions of nodes. The number of relationships is less certain, but probably a small multiple of the number of nodes. The graph would be constantly updated, adding or updating tens or a few hundreds of entities per second. The read load would probably be quite light; selecting a small number of nodes in the locality of a specified node, for example. – DNA Dec 13 '11 at 10:05
  • mmh, if you can group the updates in bigger than one-by-one transactions, you should be fine performance wise. – Peter Neubauer Dec 14 '11 at 12:27