I have next graph model:
(:PaveView {Number:int, Page:string}), (:Page {Name:string})
(:PageView)-[:At]->(:Page)
(:PageView)-[:Next]->(:PageView)
Schema:
Indexes
ON :Page(Name) ONLINE (for uniqueness constraint)
ON :PageView(Page) ONLINE
ON :PageView(Revision) ONLINE (for uniqueness constraint)
Constraints
ON (pageview:PageView) ASSERT pageview.Number IS UNIQUE
ON (page:Page) ASSERT page.Name IS UNIQUE
I want to do something similar to this post
I have tried to find popular paths without loops of this structure:
(:PageView)-[:Next*2]->(:PageView)
That my tries:
1. Nicole White's method from post
MATCH p = (:PageView)-[:Next*2]->(:PageView)
WITH p, EXTRACT(v IN NODES(p) | v.Page) AS pages
UNWIND pages AS views
WITH p, COUNT(DISTINCT views) AS distinct_views
WHERE distinct_views = LENGTH(NODES(p))
RETURN EXTRACT(v in NODES(p) | v.Page), count(p)
ORDER BY count(p) DESC
LIMIT 10;
profile output:
10 rows
177270 ms
Compiler CYPHER 2.2-rule
ColumnFilter(0)
|
+Extract(0)
|
+ColumnFilter(1)
|
+Top
|
+EagerAggregation(0)
|
+Extract(1)
|
+ColumnFilter(2)
|
+Filter(0)
|
+Extract(2)
|
+ColumnFilter(3)
|
+EagerAggregation(1)
|
+UNWIND
|
+ColumnFilter(4)
|
+Extract(3)
|
+ExtractPath
|
+Filter(1)
|
+TraversalMatcher
+---------------------+---------+----------+------------------------------------------------------------------------+------------------------------------------------------------------------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+---------------------+---------+----------+------------------------------------------------------------------------+------------------------------------------------------------------------------------------------+
| ColumnFilter(0) | 10 | 0 | EXTRACT(v in NODES(p) | v.Page), count(p) | keep columns EXTRACT(v in NODES(p) | v.Page), count(p) |
| Extract(0) | 10 | 0 | FRESHID225, FRESHID258, EXTRACT(v in NODES(p) | v.Page), count(p) | EXTRACT(v in NODES(p) | v.Page), count(p) |
| ColumnFilter(1) | 10 | 0 | FRESHID225, FRESHID258 | keep columns , |
| Top | 10 | 0 | FRESHID225, INTERNAL_AGGREGATEf7fa022b-cdb5-4ef2-bec5-a2f4f10706b6 | { AUTOINT0}; Cached( INTERNAL_AGGREGATEf7fa022b-cdb5-4ef2-bec5-a2f4f10706b6 of type Integer) |
| EagerAggregation(0) | 212828 | 0 | FRESHID225, INTERNAL_AGGREGATEf7fa022b-cdb5-4ef2-bec5-a2f4f10706b6 | |
| Extract(1) | 1749120 | 10494720 | FRESHID225, distinct_views, p | |
| ColumnFilter(2) | 1749120 | 0 | distinct_views, p | keep columns distinct_views, p |
| Filter(0) | 1749120 | 0 | FRESHID196, distinct_views, p | CoercedPredicate(anon[196]) |
| Extract(2) | 2115766 | 0 | FRESHID196, distinct_views, p | |
| ColumnFilter(3) | 2115766 | 0 | distinct_views, p | keep columns p, distinct_views |
| EagerAggregation(1) | 2115766 | 0 | INTERNAL_AGGREGATEb0939c81-a40c-4012-afd6-4852b17cf2e4, p | p |
| UNWIND | 6347298 | 0 | p, pages, views | |
| ColumnFilter(4) | 2115766 | 0 | p, pages | keep columns p, pages |
| Extract(3) | 2115766 | 12694596 | p, pages | pages |
| ExtractPath | 2115766 | 0 | p | |
| Filter(1) | 2115766 | 2115766 | | hasLabel(anon[34]:PageView(0)) |
| TraversalMatcher | 2115766 | 16926150 | | , , , |
+---------------------+---------+----------+------------------------------------------------------------------------+------------------------------------------------------------------------------------------------+
Total database accesses: 42231232
2.
match (p1:PageView)-[:Next]->(p2:PageView)-[:Next]->(p3:PageView)
where p1.Page<>p2.Page and p1.Page<>p3.Page and p2.Page<>p3.Page
RETURN [p1.Page,p2.Page,p3.Page], count(*) as count
ORDER BY count DESC
LIMIT 10;
profile output:
10 rows
28660 ms
Compiler CYPHER 2.2-cost
Projection(0)
|
+Top
|
+EagerAggregation
|
+Projection(1)
|
+Filter(0)
|
+Expand(0)
|
+Filter(1)
|
+Expand(1)
|
+NodeByLabelScan
+------------------+---------------+---------+----------+------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Operator | EstimatedRows | Rows | DbHits | Identifiers | Other |
+------------------+---------------+---------+----------+------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Projection(0) | 1241 | 10 | 0 | FRESHID146, [p1.Page,p2.Page,p3.Page], count | [p1.Page,p2.Page,p3.Page], count |
| Top | 1241 | 10 | 0 | FRESHID146, count | { AUTOINT0}; count |
| EagerAggregation | 1241 | 212828 | 0 | FRESHID146, count | |
| Projection(1) | 1542393 | 1749120 | 10494720 | FRESHID146, p1, p2, p3 | |
| Filter(0) | 1542393 | 1749120 | 17872173 | p1, p2, p3 | (((hasLabel(p3:PageView(0)) AND NOT(Property(p1,Page(3)) == Property(p3,Page(3)))) AND NOT(anon[20] == anon[43])) AND NOT(Property(p2,Page(3)) == Property(p3,Page(3)))) |
| Expand(0) | 1904189 | 1985797 | 3971596 | p1, p2, p3 | (p2)-[:Next]->(p3) |
| Filter(1) | 1904191 | 1985799 | 10578840 | p1, p2 | (NOT(Property(p1,Page(3)) == Property(p2,Page(3))) AND hasLabel(p2:PageView(0))) |
| Expand(1) | 2115767 | 2115768 | 4231538 | p1, p2 | (p1)-[:Next]->(p2) |
| NodeByLabelScan | 2115770 | 2115770 | 2115771 | p1 | :PageView |
+------------------+---------------+---------+----------+------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
3. (With loops!? And I don't know why! I suggested that if identifiers are different then nodes are different)
match (pv1:PageView)-[:Next]->(pv2:PageView)-[:Next]->(pv3:PageView),
(pv1)-[:At]->(p1),(pv2)-[:At]->(p2),(pv3)-[:At]->(p3)
RETURN [p1.Name,p2.Name,p3.Name], count(*) as count
ORDER BY count DESC
LIMIT 10;
profile output:
10 rows
27678 ms
Compiler CYPHER 2.2-cost
Projection(0)
|
+Top
|
+EagerAggregation
|
+Projection(1)
|
+Filter(0)
|
+Expand(0)
|
+Filter(1)
|
+Expand(1)
|
+Filter(2)
|
+Expand(2)
|
+Filter(3)
|
+Expand(3)
|
+Expand(4)
|
+NodeByLabelScan
+------------------+---------------+---------+----------+------------------------------------------------+------------------------------------------------------------+
| Operator | EstimatedRows | Rows | DbHits | Identifiers | Other |
+------------------+---------------+---------+----------+------------------------------------------------+------------------------------------------------------------+
| Projection(0) | 1454 | 10 | 0 | FRESHID139, [p1.Name,p2.Name,p3.Name], count | [p1.Name,p2.Name,p3.Name], count |
| Top | 1454 | 10 | 0 | FRESHID139, count | { AUTOINT0}; count |
| EagerAggregation | 1454 | 223557 | 0 | FRESHID139, count | |
| Projection(1) | 2115760 | 2115764 | 12694584 | FRESHID139, p1, p2, p3, pv1, pv2, pv3 | |
| Filter(0) | 2115760 | 2115764 | 0 | p1, p2, p3, pv1, pv2, pv3 | (NOT(anon[116] == anon[80]) AND NOT(anon[80] == anon[98])) |
| Expand(0) | 2115760 | 2115764 | 4231530 | p1, p2, p3, pv1, pv2, pv3 | (pv1)-[:At]->(p1) |
| Filter(1) | 2115762 | 2115766 | 2115766 | p2, p3, pv1, pv2, pv3 | (hasLabel(pv1:PageView(0)) AND NOT(anon[21] == anon[45])) |
| Expand(1) | 2115762 | 2115766 | 4231532 | p2, p3, pv1, pv2, pv3 | (pv2)<-[:Next]-(pv1) |
| Filter(2) | 2115764 | 2115766 | 0 | p2, p3, pv2, pv3 | NOT(anon[116] == anon[98]) |
| Expand(2) | 2115764 | 2115766 | 4231534 | p2, p3, pv2, pv3 | (pv2)-[:At]->(p2) |
| Filter(3) | 2115766 | 2115768 | 2115768 | p3, pv2, pv3 | hasLabel(pv2:PageView(0)) |
| Expand(3) | 2115765 | 2115768 | 4231536 | p3, pv2, pv3 | (pv3)<-[:Next]-(pv2) |
| Expand(4) | 2115767 | 2115768 | 4231538 | p3, pv3 | (pv3)-[:At]->(p3) |
| NodeByLabelScan | 2115770 | 2115770 | 2115771 | pv3 | :PageView |
+------------------+---------------+---------+----------+------------------------------------------------+------------------------------------------------------------+
System info:
windows 8.1
250G ssd
neo4j enterprise 2.2.0-M02
cache: hpc
ram: 8G
jvm heap size: 4G
memory mapping: 50%
149 (:Page) nodes
2115770 (:PageView) nodes
Why even the fastest of this three methods is so slow? (I guess that all my data is in RAM)
What is the best way to filter paths with loops?