I have a piece of code which deletes a vertex and commits the transaction. The next operation still sees the vertex for some reason. Its also strange that it only sees it sometimes may be based on timing etc. e.g. graph service--contains-->route
operation 1: deletes contains edge and delete vertex and commit
operation 2: get contains edge from service node and it still gets the route node which was deleted in operation 1
The 2 operations are one after the other and not run in parallel so there is no issue of reading it before 1st commit.
Also if 1st commit is completed successfully then my understanding is that all other threads should immediately see the updates.
using janusgraph api for java with cassandra db
sample pseudo code:
synchronized methodA:
do some operations
figure out route X need to be deleted from graph
get all routes using contains edge from service node
// service---contains--> route
get route X from all routes
singlethreadExecutor.submitTask(DeleteRoute X)
update some other DB with service without route X
Task DeleteRoute (route x)
get route X from graph DB
delete route X vertex
commit
Operation1 calls into methodA:
service with 4 routes R1,R2, R3, R4
Expected to delete R3
Works as expected
R3 is deleted from graph as well as other DB
Operation2 calls into methodA:
service expected routes in graph with R1, R2, R4
however, method A still gets all 4 routes including R3 which is deleted in operation 1
Please note method A is synchronized so operation1 and 2 do not collide with each other. operation1 is complete and then operation 2 is started
This is puzzling to me especially when my logs indicates commit completed for operation 1 and operation 2 still gets the route node R3 from graph using janusgraph api.
We are not using threaded transaction We are not using new transaction We rely on tinkerpop opening new transaction with first operation for the thread.
log snippets:
Operation 1:
2019-06-17 14:58:25,213 | deleteNode : route:1560307936368:1683669533 2019-06-17 14:58:25,216 | commit 2019-06-17 14:58:25,350 | Time Taken in commit = 133
Operation 2:
2019-06-17 14:58:25,738 | updateNode 2019-06-17 14:58:25,739 | updateNode Node to be updated: route:1560307936368:1683669533 2019-06-17 14:58:25,740 | updateVertex: vertex updated for key: route:1560307936368:1683669533 2019-06-17 14:58:25,741 | updateNode Time Taken in updateNode = 3
As you can see Operation 1 deletes the route node and commits and operation 2 when it reads from graph, still gets the same route node and was able to update it. Our update api check if the vertex is present before updating it and throws error if its not present.
So clearly the vertex is still returned from graph using janusgraph getVertex api based on node id key even if the delete was success and commit was complete just before it.
The same code works as expected if the time difference between the 2 operations is manipulated to be more than couple of minutes.
We also have configured to use janushgraph cache.
With all of this give, I am really puzzled how is this even happening.
I can understand if the 2 operations are somehow running in parallel and step on each other and race conditions can give me stale data but the operations are synchronized and happen one after the other.
expected to not return the vertex in 2nd operation after its deleted and commited in 1st operation especially when both the operations are synchronized and happen one after the other without any failures/exceptions.
Use case 1:
Thread-1 ----calls---> synchronized method-1---> get edge/vertex, update vertex, commit ----submits ---> singleThreadedExecutorTask ---> delete edge/vertex, commit ----> calls --> synchronized method-1 (for operation 2) ----> here the get edge/vertex still gets the old edge/vertex
I can understand use case 2 where transaction scope is for the thread with first operation and anything committed in other threads is not visible in this transaction scope, so I have to ideally commit transaction before starting operation 2 to see the changes.
I tried this for the Use case 2 and it works as expected !!
Use case 2:
Thread-1 ----calls---> synchronized method-1---> get edge/vertex, update vertex, commit ----submits ---> singleThreadedExecutorTask ---> delete edge/vertex, commit ----> Thread-1 Completes.
After around one minute:
Thread-2 ----calls---> synchronized method-1---> get edge/vertex, update vertex, commit ----submits ---> singleThreadedExecutorTask ---> delete edge/vertex, commit ----> Thread-2 Completes.
Problem Thread-2 call into synchronized method-1 still gets the old edge/vertex which is deleted as part of Thread-1 process.
Now in this case.
Thread-1 scoped transaction is opened with first graph operation and that transaction gets closed immediately after update. After that singleThreadedExecutor task is run in separate thread so, it opens its own new transaction for 1st operation and closes the transaction with commit when task is finished.
Thread-2 when it starts after a minute opens its own thread scoped transaction with 1st graph operation - this get operation in this new thread transaction scope should be able to get the correct data without deleted edge/vertex from thread 1 especially considering ti starts almost after 1 minute. This is not even a clustered set up. And even with clustered set up - i thought quorum has to be satisfied before the commit calls can return and rest of the replication can happen independently (delayed)
This is the part I am not able to understand, of course if I add manual intervention with 2 threads like starting thread 1 may be after 2 minutes, it works for some reason.
2 minute seems to be really long for eventual consistency in this case.
So whats the option for the application to handle this?
Is there any way to force the graph operation to wait for eventual consistency? Like thread-2 i can specify the first get operation has to wait unless it returns consistent data by resolving all conflicts etc.
I dont think opening new transaction in thread 2 or trying to do some sort of global commit to close previous stale transaction opened if any is the right way to do it as this is just the start of new thread.