0

I want to add a new Property [and some times add edges] to a selection of nodes in an existing Graph of 2 million nodes, 10+ million edges. I thought of using BatchGraph but from their WIKI looks like it does not support any retrieval queries.

For e.g. retrieve these nodes: g.V('id',1).has('prop1','text1') and update 'prop1' to 'text2'.

What is the best way to do this.

Alfiyum
  • 361
  • 2
  • 7

1 Answers1

0

I don't think you need to use BatchGraph here. It sounds as if you are doing a large graph mutation in which case it would probably best to just write a Gremlin script to do your changes. You don't have a very large graph so unless you plan to do some very complex mutations (e.g a fat multi-step traversal), it shouldn't take very long to execute. If you do think it's going to run "long" you should think of ways to parallelize the job. If you go this route you might consider using gpars.

As your graph grows, you will find that you will need to use Faunus for most data administration. Specifically, that means utilizing script step.

stephen mallette
  • 45,298
  • 5
  • 67
  • 135
  • Thanks. Well, the graph will grow eventually, so was looking for better ways. I'll take a look at gpars. Also in my Graph [Cassandra-ES, has User and Place nodes in millions] noticed, when I'm adding edges between existing nodes [using Java, about 7 threads], reading from a file,User --> Place with 'likes' property set on it, it is throwing large number of 'com.thinkaurelius.titan.diskstorage.locking.PermanentLockingException: Local lock contention' exceptions. Any better way to handle this? – Alfiyum Jun 06 '14 at 02:25
  • Updated my answer a bit to reflect "eventual growth" of your graph. If you have locks you should expect failure at times. You should have some sort of retry strategy in place to re-execute your transaction in the event of failure. Blueprints has the `TransactionRetryHelper` (https://github.com/tinkerpop/blueprints/wiki/Graph-Transactions#transaction-retry-helper) that can be useful in that regard. You might also consider preprocessing your file so you can turn `storage.batch-loading` on with confidence (which will turn locking off) – stephen mallette Jun 06 '14 at 09:45
  • Faunus script step looks great. In the past I also tried using TransactionRetryHelper, and still saw lot `PermanentLockingException` messages in the log. Just trying to understand, is it because adding an edge updates adjacency list of both out and in vertex? Since my data set [User --> Places] is Many-To-Many, the moment multiple threads try to update the graph, these exceptions are bound to happen..correct? So given this, can I enable storage.batch-loading? – Alfiyum Jun 06 '14 at 10:59
  • It might not be safe to enable batch-loading if you need to ensure that your uniqueness. You would need to pre-process your data to be a unique list of users/places. Load that followed by the edges which with batch-loading on. If you can't preprocess for some reason, then you should try to release locks more quickly (by commit/rollback). You don't mention how big your transaction sizes are, but the longer you leave a lock open, the greater the chance of a locking exception. Perhaps lowering your transaction size would help. – stephen mallette Jun 06 '14 at 13:07
  • You also want to make sure that you don't create weird race conditions with your locks. So within a transaction you have a lock on userId and placeId. One transaction in one thread grabs the userId lock but fails to get the placeId lock. It fails to get the placeId lock because a different thread has that placeId, but not the userId lock. Neither can complete. Sometimes having some additional randomization in the retry delay can help with those situations, but if they can be avoided all together so much the better. – stephen mallette Jun 06 '14 at 13:10