Returning edges taking too much times

Question

SELECT * from cypher('age', $$
MATCH (V)-[R]-(V2)
RETURN V,R,V2
as (V agtype, R agtype, V2 agtype);

If there are only a few edges (e.g. around 100), queries can be executed quickly without much delay. However, if there are 500 or 1000 edges, it can take more than 10 minutes to complete the query. Why is this happening?

Try to run query to return n edges

I recommend you to see https://github.com/apache/age/issues/628. There is detailed discussion about this issue and suggested optimization techniques. — Muhammad Taha Naveed, Feb 23 '23 at 21:36

score 0 · Answer 1 · answered Feb 23 '23 at 20:24

Apache Age must traverse the edges of the graph while performing a query, which can take a while for big graphs. Several variables, like the size of the graph, the complexity of the query, the hardware resources available, and the configuration of Apache Age, can have an impact on how well queries perform.

Because the graph traversal is straightforward and the amount of data to be processed is small when there are few edges, the query can be quickly run. Nevertheless, the query becomes more complex and takes longer to finish when there are 500 or 1000 edges. I guess the hardware resources available, such as the number of CPUs, amount of memory, and disc speed, may also have an impact on how quickly the query is completed.

Matheus Farias · Answer 2 · 2023-03-15T19:54:54.493

There is actually a GitHub issue that addresses this topic. Here is the link. When you query like MATCH (v1)-[e:edge_label]-(v2) RETURN startNode(e), e, endNode(e) it takes longer than MATCH (v1)-[e:edge_label]->(v2) RETURN startNode(e), e, endNode(e) or if you define a label for the edge.

This is because with a direction set, you are not calculating the direction of the edge multiple times, not making a bi-direction scan. Now, defining the labels will help also because you'll query only for the labels you want and not other unnecessary ones.

You could also type something like this in your query: MATCH (V)-[R:similar*1..1]-(V2) to make it faster with a VLE (Variable Length Edge). The regular match uses nested joins to find the results whereas the VLE MATCH uses a graph pathing function. It is a different engine that is finding the matches in each case. The VLE producing candidates that need to be filtered out.

The problem with the regular MATCH is that these JOINS can nest way to deep, depending on the graph. This is compounded by the labels being in separate tables.

Returning edges taking too much times

2 Answers2