Limit edges used on named graph traversal

Question

Q: Can I limit the edge collections the system will try to use when traversing named graphs AQL?

Scenario:

If I have a named graph productGraph with two vertices collections and two edge collections:

Vertices: product, price
prodParentOf (product A is parent of product B)
prodHasPrice (product A has a price of $X)

If now I want the products children of product A (and no prices) , I would like to do something like this

WITH product
FOR v, e, p IN OUTBOUND 'product/A'
GRAPH 'productGraph'
RETURN {vertice:v, edge:e, path: p}

However, if I look at the explain plan, I see that the system attempted to use the indexes for both prodParentOf and prodHasPrice (even if I explicitly put the product collection in the 'With' clause):

Indexes used:
 By   Type   Collection     Unique   Sparse   Selectivity   Fields               Ranges
  2   edge   prodHasPrice   false    false        75.00 %   [ `_from`, `_to` ]   base OUTBOUND
  2   edge   prodParentOf   false    false        65.37 %   [ `_from`, `_to` ]   base OUTBOUND

Can I limit the edge collections the system will try to use when querying named graphs? Or do I have to use edge collections in the query instead. (which in my mind would mean that it would better to traverse edge collections in general than named graphs).

Here is the same query using an edge collection

FOR v, e, p IN OUTBOUND 'product/A'
prodParentOf
RETURN {vertice:v, edge:e, path: p}

score 0 · Accepted Answer · answered Oct 11 '18 at 09:32

The WITH clause does not impose restrictions on which collections that are part of your named graph will be used in a traversal. It is mainly for traversals in cluster, to declare which collections will be involved. This helps to avoid deadlocks, which may occur if collections are lazily locked at query runtime.

If you use a single server instance, then the WTIH clause is optional. It does not have an effect on the result. If you want to exclude collections from traversal, you can either use collections sets instead of the named graph, or use FILTERs together with IS_SAME_COLLECTION(). Using collection sets is more efficient, because with less edge collections there are less edges to traverse, whereas filters are applied after the traversal in most cases.

FOR v, e, p IN 1..5 OUTBOUND 'verts/start' GRAPH 'named-graph'
  FILTER (FOR id IN p.edges[*]._id RETURN IS_SAME_COLLECTION('edgesX', id)) ALL == true
  RETURN p

If your traversal has a depth of 1 only, then a filter query is simpler:

FOR v, e, p IN INBOUND 'product/A' GRAPH 'productGraph'
  FILTER IS_SAME_COLLECTION('prodParentOf', e)
  RETURN {vertex: v, edge: e, path: p}

A way to prune paths may come in the future, which should also help with your named graph scenario.

Thanks for the great answer! It is much clearer now. I do have a question about filtering on though. I was under the impression that filtering on vertices and edges just affects the end results, but filtering the paths actually has an effect on the traversal and what paths are used. So in your first example, would the filter be applied during the traversal ? — camba1, Oct 11 '18 at 13:42
In my first example there is a subquery, which first computes an array of Boolean values and only after that the filter condition is checked whether all of the are true (i.e. all edges are in the edge collection 'edgesX'). Such a pattern is not recognized by the optimizer as far as I know and have seen, which means it is a post-filter. Certain other filter conditions on the `p` variable are optimized however, see e.g. https://github.com/arangodb/arangodb/issues/1897 — CodeManX, Oct 12 '18 at 08:38

Limit edges used on named graph traversal

1 Answers1