0

I'm getting an unexpected StopIteration error with some gremlin queries that contain a count step within nested filter steps.

This error can be recreated with the following code (using Gremlin-Python, 3.5.0 in my case):

filter_header = g.addV().id().next()
count_headers = [g.addV().id().next() for _ in range(10)]

for i, c in enumerate(count_headers):
    # Add 10 nodes
    sub_nodes = [g.addV().id().next() for _ in range(10)]
    # Connect them all to the header
    for s in sub_nodes:
        g.V(c).addE('edge').to(__.V(s)).iterate()
    # Connect i of them to the filter header
    for s in sub_nodes[:i]:
        g.V(filter_header).addE('edge').to(__.V(s)).iterate()

# This raises StopIterationError
g.V(count_headers).filter(
    __.out('edge').filter(
        __.in_('edge').hasId(filter_header)
    ).count().is_(P.gt(1))
).count().next()

(Equivalently if using toList instead of next I get an empty list)

However this error doesn't happen if you unfold after the count:

# No StopIterationError
g.V(count_headers).filter(
    __.out('edge').filter(
        __.in_('edge').hasId(filter_header)
    ).count().unfold().is_(P.gt(1))
).count().next()

Neither does it happen if you use map instead of filter:

# No StopIterationError
g.V(count_headers).as_('c').map(
    __.out('edge').filter(
        __.in_('edge').hasId(filter_header)
    ).count().is_(P.gt(1))
).select('c').count().next()

I've tested and this error doesn't happen when using TinkerGraph, so I suspect this is specific to AWS Neptune.

I'd really appreciate any guidance as to why this happens, if I'm doing anything wrong, or what the differences are that means this just happens in Neptune. Alternatively - if the consensus is that this is a bug - I'd appreciate it if anyone could let me know where to raise it.

2 Answers2

0

When using a Gremlin client, such as Gremlin Python, if a query has no result, the next step will throw an error. I prefer to always use toList as that way you are guaranteed to at least get an empty list back. If you use TinkerGraph locally with the Gremlin Console you will not see the same behavior. If getting no result is also unexpected, that is a second level item to explore.

As an example of the Python next behavior, here is a simple experiment using the Python console. If you run your same tests with a Gremlin Server backed by TinkerGraph you will see the same results.


>>> g.V().hasId('I do not exist').next()

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ec2-user/.local/lib/python3.6/site-packages/gremlin_python/process/traversal.py", line 89, in next
    return self.__next__()
  File "/home/ec2-user/.local/lib/python3.6/site-packages/gremlin_python/process/traversal.py", line 50, in __next__
    self.last_traverser = next(self.traversers)
StopIteration

Kelvin Lawrence
  • 14,674
  • 2
  • 16
  • 38
  • Hi Kevin, thanks for your reply! In this case, the unexpected behaviour is that the query has no result. So you're right that `toList` doesn't raise an error, but as you say it returns an empty list. If I run the query with `toList` on both Neptune and TinkerGraph, I get `[]` on Neptune and `[8]` on TinkerGraph - the TinkerGraph result is what I would expect, whereas I don't understand why Neptune is returning an empty list. – Hugh Blayney Nov 25 '21 at 13:31
  • When you created the graph using TinkerGraph was that done using GremlinServer and Python or did you use straight `addV` and `addE` steps without the list comprehension approach? If you used `addV` steps could you please also add those to the question. – Kelvin Lawrence Nov 25 '21 at 13:43
  • I used exactly the same code in the question when testing both Neptune and TinkerGraph, just re-declaring `g` to connect to each one. – Hugh Blayney Nov 25 '21 at 13:53
  • Thanks. I’ll try to reproduce what you are seeing. – Kelvin Lawrence Nov 25 '21 at 14:32
  • Hi Kelvin - thank you again for looking into this more closely. However, I think the query you've added doesn't match up with the query I started with - you state "Every vertex that results from filter(out('edge')) only has outgoing edges but your query tests for incoming edges." You're right that the `count_headers` have no predecessors, but my query is checking if the _successors_ of the `count_headers` have predecessors. Your query would more closely match mine if the `in` step was inside the `filter` step. – Hugh Blayney Nov 25 '21 at 16:59
  • I realized after I posted that last update I had made a bad assumption so I had actually already deleted it. I'm going to look into this some more as time allows (supposed to be a holiday today :-) ) – Kelvin Lawrence Nov 25 '21 at 17:01
  • To try and reduce confusion, I think the central point of my original question could be reduced to: "given the graph created by the queries I gave, why does `g.V(count_headers).filter( __.out('edge').filter( __.in_('edge').hasId(filter_header) ).count().is_(P.gt(1)) ).count().toList()` return an empty list when `g.V(count_headers).filter( __.out('edge').filter( __.in_('edge').hasId(filter_header) ).count().unfold().is_(P.gt(1)) ).count().next()` doesn't, when TinkerGraph returns an answer in both cases?" – Hugh Blayney Nov 25 '21 at 17:04
  • Ah sorry, I'm not refreshing this page enough so I missed that! No worries at all, I appreciate all the help already. – Hugh Blayney Nov 25 '21 at 17:04
  • Thanks - that is helpful - with the `unfold` I get an 8. Is that the answer you are expecting? – Kelvin Lawrence Nov 25 '21 at 17:09
  • Yep, that's the answer I'm expecting. – Hugh Blayney Nov 25 '21 at 17:11
  • The `filter` can be simplified a little but I still had to add a `barrier` step (`unfold` will also work) to get the expected result. Will continue to investigate what might be going on. `g.V(${count_headers}). filter(out('edge').in('edge').hasId('${filter_header}').barrier().count().is(P.gt(1))).count()` – Kelvin Lawrence Nov 26 '21 at 16:23
  • Hi again. After researching this further it turns out there is a small issue with the Neptune query engine that will be fixed in a future release. For now please use the workarounds we have discussed. – Kelvin Lawrence Nov 30 '21 at 23:56
  • Good to know! Thank you for your help with this, we'll use the workarounds for the time being. – Hugh Blayney Dec 01 '21 at 11:10
0

For anyone that finds themselves here: this was a bug that was fixed in Neptune Engine release 1.1.1.0.

"Fixed a rare Gremlin bug where no results were returned when using nested filter() and count() steps in combination"

(Thanks to the Neptune team for fixing!)