I was able to reproduce and fix it, by following Data Consistency. I was missing the following the command after setting the ConsistencyModifier.
mgmt.commit()
Following is the piece of code which reproduced the problem, with both the versions of cassandra i.e. cassandra 2.1.x and cassandra 3.9.x.
TitanGraph graph = TitanFactory.open("/opt/cmsgraph/config/edgepoc.conf");
try {
int parent = -2128958273;
int child = 58541705;
int hostid = 83;
int numThreads = 100;
Thread[] threads = new Thread[numThreads];
for(int i =0; i < numThreads; i++) {
threads[i] = new Thread(new EdgeUpdator(graph, parent, child, hostid));
}
for(int i =0; i < numThreads; i++) {
threads[i].start();
}
for(int i = 0; i < numThreads; i++) {
threads[i].join();
}
} finally {
graph.close();
}
private static class EdgeUpdator implements Runnable {
public EdgeUpdator(TitanGraph graph, int parent, int child, int hostid) {
this.graph = graph;
this.parent = parent;
this.child = child;
this.hostid = hostid;
}
private int parent;
private int child;
private int hostid;
private TitanGraph graph;
public void run() {
TitanTransaction trxn = graph.newTransaction();
GraphTraversalSource g = trxn.traversal();
Edge edge = (Edge)g.V().has("msid", parent).outE("prio_child").has("hostid_e", hostid).as("e").inV().has("msid", child).select("e").next();
Random random = new Random(System.nanoTime());
edge.property("updatedAt_e", random.nextLong());
edge.property("plrank", random.nextInt());
trxn.commit();
}
}
Before executing the the above code. I see:
gremlin> g.V().has('msid', -2128958273).outE('prio_child').has('hostid_e', 83).as('e').inV().has('msid', 58541705).select('e')
==>e[239suvpz-17ofqw-41ed-9eutzq8][73363640-prio_child->20489355296]
gremlin> g.V().has('msid', -2128958273).outE('prio_child').has('hostid_e', 83).as('e').inV().has('msid', 58541705).select('e').count()
==>1
gremlin> g.V().has('msid', -2128958273).outE('prio_child').has('hostid_e', 83).count()
==>104
After executing the code, I see:
gremlin> g.V().has('msid', -2128958273).outE('prio_child').has('hostid_e', 83).as('e').inV().has('msid', 58541705).select('e')
==>e[239suvpz-17ofqw-41ed-9eutzq8][73363640-prio_child->20489355296]
==>e[239suvpz-17ofqw-41ed-9eutzq8][73363640-prio_child->20489355296]
==>e[239suvpz-17ofqw-41ed-9eutzq8][73363640-prio_child->20489355296]
==>e[239suvpz-17ofqw-41ed-9eutzq8][73363640-prio_child->20489355296]
==>e[239suvpz-17ofqw-41ed-9eutzq8][73363640-prio_child->20489355296]
==>e[239suvpz-17ofqw-41ed-9eutzq8][73363640-prio_child->20489355296]
==>e[239suvpz-17ofqw-41ed-9eutzq8][73363640-prio_child->20489355296]
==>e[239suvpz-17ofqw-41ed-9eutzq8][73363640-prio_child->20489355296]
==>e[239suvpz-17ofqw-41ed-9eutzq8][73363640-prio_child->20489355296]
==>e[239suvpz-17ofqw-41ed-9eutzq8][73363640-prio_child->20489355296]
gremlin> g.V().has('msid', -2128958273).outE('prio_child').has('hostid_e', 83).as('e').inV().has('msid', 58541705).select('e').count()
==>10
gremlin> g.V().has('msid', -2128958273).outE('prio_child').has('hostid_e', 83).as('e').inV().has('msid', 58541705).select('e').dedup().count()
==>1
gremlin> g.V().has('msid', -2128958273).outE('prio_child').has('hostid_e', 83).count()
==>113
gremlin> g.V().has('msid', -2128958273).outE('prio_child').count()
==>104
After applying the ConsitencyModifier.LOCK to "prio_child" edge, I observed that 9 of the 10 threads failed with following exception and, I it didn't result in any multiple edges with same edge id problem.
Exception in thread "Thread-8" org.apache.tinkerpop.gremlin.process.traversal.util.FastNoSuchElementException
Following is the exact changes I made:
mgmt = graph.openManagement()
prio_child=mgmt.getRelationType('prio_child')
mgmt.setConsistency(prio_child, ConsistencyModifier.LOCK)
mgmt.commit()