1

I have next graph model:

(:PaveView {Number:int, Page:string}), (:Page {Name:string})
(:PageView)-[:At]->(:Page)
(:PageView)-[:Next]->(:PageView)

Schema:

Indexes
  ON :Page(Name)            ONLINE (for uniqueness constraint)
  ON :PageView(Page)        ONLINE
  ON :PageView(Revision)    ONLINE (for uniqueness constraint)

Constraints
  ON (pageview:PageView) ASSERT pageview.Number IS UNIQUE
  ON (page:Page) ASSERT page.Name IS UNIQUE

I want to do something similar to this post

I have tried to find popular paths without loops of this structure:

(:PageView)-[:Next*2]->(:PageView)

That my tries:
1. Nicole White's method from post

MATCH p = (:PageView)-[:Next*2]->(:PageView)
WITH p, EXTRACT(v IN NODES(p) | v.Page) AS pages 
UNWIND pages AS views 
WITH p, COUNT(DISTINCT views) AS distinct_views 
WHERE distinct_views = LENGTH(NODES(p)) 
RETURN EXTRACT(v in NODES(p) | v.Page), count(p)
ORDER BY count(p) DESC
LIMIT 10;

profile output:

10 rows
177270 ms

Compiler CYPHER 2.2-rule

ColumnFilter(0)
  |
  +Extract(0)
    |
    +ColumnFilter(1)
      |
      +Top
        |
        +EagerAggregation(0)
          |
          +Extract(1)
            |
            +ColumnFilter(2)
              |
              +Filter(0)
                |
                +Extract(2)
                  |
                  +ColumnFilter(3)
                    |
                    +EagerAggregation(1)
                      |
                      +UNWIND
                        |
                        +ColumnFilter(4)
                          |
                          +Extract(3)
                            |
                            +ExtractPath
                              |
                              +Filter(1)
                                |
                                +TraversalMatcher

+---------------------+---------+----------+------------------------------------------------------------------------+------------------------------------------------------------------------------------------------+
|            Operator |    Rows |   DbHits |                                                            Identifiers |                                                                                          Other |
+---------------------+---------+----------+------------------------------------------------------------------------+------------------------------------------------------------------------------------------------+
|     ColumnFilter(0) |      10 |        0 |                              EXTRACT(v in NODES(p) | v.Page), count(p) |                                         keep columns EXTRACT(v in NODES(p) | v.Page), count(p) |
|          Extract(0) |      10 |        0 |    FRESHID225,   FRESHID258, EXTRACT(v in NODES(p) | v.Page), count(p) |                                                      EXTRACT(v in NODES(p) | v.Page), count(p) |
|     ColumnFilter(1) |      10 |        0 |                                               FRESHID225,   FRESHID258 |                                                                                keep columns ,  |
|                 Top |      10 |        0 |   FRESHID225,   INTERNAL_AGGREGATEf7fa022b-cdb5-4ef2-bec5-a2f4f10706b6 | {  AUTOINT0}; Cached(  INTERNAL_AGGREGATEf7fa022b-cdb5-4ef2-bec5-a2f4f10706b6 of type Integer) |
| EagerAggregation(0) |  212828 |        0 |   FRESHID225,   INTERNAL_AGGREGATEf7fa022b-cdb5-4ef2-bec5-a2f4f10706b6 |                                                                                                |
|          Extract(1) | 1749120 | 10494720 |                                          FRESHID225, distinct_views, p |                                                                                                |
|     ColumnFilter(2) | 1749120 |        0 |                                                      distinct_views, p |                                                                 keep columns distinct_views, p |
|           Filter(0) | 1749120 |        0 |                                          FRESHID196, distinct_views, p |                                                                    CoercedPredicate(anon[196]) |
|          Extract(2) | 2115766 |        0 |                                          FRESHID196, distinct_views, p |                                                                                                |
|     ColumnFilter(3) | 2115766 |        0 |                                                      distinct_views, p |                                                                 keep columns p, distinct_views |
| EagerAggregation(1) | 2115766 |        0 |              INTERNAL_AGGREGATEb0939c81-a40c-4012-afd6-4852b17cf2e4, p |                                                                                              p |
|              UNWIND | 6347298 |        0 |                                                        p, pages, views |                                                                                                |
|     ColumnFilter(4) | 2115766 |        0 |                                                               p, pages |                                                                          keep columns p, pages |
|          Extract(3) | 2115766 | 12694596 |                                                               p, pages |                                                                                          pages |
|         ExtractPath | 2115766 |        0 |                                                                      p |                                                                                                |
|           Filter(1) | 2115766 |  2115766 |                                                                        |                                                                 hasLabel(anon[34]:PageView(0)) |
|    TraversalMatcher | 2115766 | 16926150 |                                                                        |                                                                                         , , ,  |
+---------------------+---------+----------+------------------------------------------------------------------------+------------------------------------------------------------------------------------------------+

Total database accesses: 42231232


2.

match (p1:PageView)-[:Next]->(p2:PageView)-[:Next]->(p3:PageView)
where p1.Page<>p2.Page and p1.Page<>p3.Page and p2.Page<>p3.Page
RETURN [p1.Page,p2.Page,p3.Page], count(*) as count
ORDER BY count DESC
LIMIT 10;

profile output:

10 rows
28660 ms

Compiler CYPHER 2.2-cost

Projection(0)
  |
  +Top
    |
    +EagerAggregation
      |
      +Projection(1)
        |
        +Filter(0)
          |
          +Expand(0)
            |
            +Filter(1)
              |
              +Expand(1)
                |
                +NodeByLabelScan

+------------------+---------------+---------+----------+------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|         Operator | EstimatedRows |    Rows |   DbHits |                                    Identifiers |                                                                                                                                                                    Other |
+------------------+---------------+---------+----------+------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|    Projection(0) |          1241 |      10 |        0 |   FRESHID146, [p1.Page,p2.Page,p3.Page], count |                                                                                                                                         [p1.Page,p2.Page,p3.Page], count |
|              Top |          1241 |      10 |        0 |                              FRESHID146, count |                                                                                                                                                      {  AUTOINT0}; count |
| EagerAggregation |          1241 |  212828 |        0 |                              FRESHID146, count |                                                                                                                                                                          |
|    Projection(1) |       1542393 | 1749120 | 10494720 |                         FRESHID146, p1, p2, p3 |                                                                                                                                                                          |
|        Filter(0) |       1542393 | 1749120 | 17872173 |                                     p1, p2, p3 | (((hasLabel(p3:PageView(0)) AND NOT(Property(p1,Page(3)) == Property(p3,Page(3)))) AND NOT(anon[20] == anon[43])) AND NOT(Property(p2,Page(3)) == Property(p3,Page(3)))) |
|        Expand(0) |       1904189 | 1985797 |  3971596 |                                     p1, p2, p3 |                                                                                                                                                       (p2)-[:Next]->(p3) |
|        Filter(1) |       1904191 | 1985799 | 10578840 |                                         p1, p2 |                                                                                         (NOT(Property(p1,Page(3)) == Property(p2,Page(3))) AND hasLabel(p2:PageView(0))) |
|        Expand(1) |       2115767 | 2115768 |  4231538 |                                         p1, p2 |                                                                                                                                                       (p1)-[:Next]->(p2) |
|  NodeByLabelScan |       2115770 | 2115770 |  2115771 |                                             p1 |                                                                                                                                                                :PageView |
+------------------+---------------+---------+----------+------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

3. (With loops!? And I don't know why! I suggested that if identifiers are different then nodes are different)

match (pv1:PageView)-[:Next]->(pv2:PageView)-[:Next]->(pv3:PageView),
(pv1)-[:At]->(p1),(pv2)-[:At]->(p2),(pv3)-[:At]->(p3)
RETURN [p1.Name,p2.Name,p3.Name], count(*) as count
ORDER BY count DESC
LIMIT 10;

profile output:

10 rows
27678 ms

Compiler CYPHER 2.2-cost

Projection(0)
  |
  +Top
    |
    +EagerAggregation
      |
      +Projection(1)
        |
        +Filter(0)
          |
          +Expand(0)
            |
            +Filter(1)
              |
              +Expand(1)
                |
                +Filter(2)
                  |
                  +Expand(2)
                    |
                    +Filter(3)
                      |
                      +Expand(3)
                        |
                        +Expand(4)
                          |
                          +NodeByLabelScan

+------------------+---------------+---------+----------+------------------------------------------------+------------------------------------------------------------+
|         Operator | EstimatedRows |    Rows |   DbHits |                                    Identifiers |                                                      Other |
+------------------+---------------+---------+----------+------------------------------------------------+------------------------------------------------------------+
|    Projection(0) |          1454 |      10 |        0 |   FRESHID139, [p1.Name,p2.Name,p3.Name], count |                           [p1.Name,p2.Name,p3.Name], count |
|              Top |          1454 |      10 |        0 |                              FRESHID139, count |                                        {  AUTOINT0}; count |
| EagerAggregation |          1454 |  223557 |        0 |                              FRESHID139, count |                                                            |
|    Projection(1) |       2115760 | 2115764 | 12694584 |          FRESHID139, p1, p2, p3, pv1, pv2, pv3 |                                                            |
|        Filter(0) |       2115760 | 2115764 |        0 |                      p1, p2, p3, pv1, pv2, pv3 | (NOT(anon[116] == anon[80]) AND NOT(anon[80] == anon[98])) |
|        Expand(0) |       2115760 | 2115764 |  4231530 |                      p1, p2, p3, pv1, pv2, pv3 |                                          (pv1)-[:At]->(p1) |
|        Filter(1) |       2115762 | 2115766 |  2115766 |                          p2, p3, pv1, pv2, pv3 |  (hasLabel(pv1:PageView(0)) AND NOT(anon[21] == anon[45])) |
|        Expand(1) |       2115762 | 2115766 |  4231532 |                          p2, p3, pv1, pv2, pv3 |                                       (pv2)<-[:Next]-(pv1) |
|        Filter(2) |       2115764 | 2115766 |        0 |                               p2, p3, pv2, pv3 |                                 NOT(anon[116] == anon[98]) |
|        Expand(2) |       2115764 | 2115766 |  4231534 |                               p2, p3, pv2, pv3 |                                          (pv2)-[:At]->(p2) |
|        Filter(3) |       2115766 | 2115768 |  2115768 |                                   p3, pv2, pv3 |                                  hasLabel(pv2:PageView(0)) |
|        Expand(3) |       2115765 | 2115768 |  4231536 |                                   p3, pv2, pv3 |                                       (pv3)<-[:Next]-(pv2) |
|        Expand(4) |       2115767 | 2115768 |  4231538 |                                        p3, pv3 |                                          (pv3)-[:At]->(p3) |
|  NodeByLabelScan |       2115770 | 2115770 |  2115771 |                                            pv3 |                                                  :PageView |
+------------------+---------------+---------+----------+------------------------------------------------+------------------------------------------------------------+

System info:
windows 8.1
250G ssd
neo4j enterprise 2.2.0-M02
cache: hpc
ram: 8G
jvm heap size: 4G
memory mapping: 50%
149 (:Page) nodes
2115770 (:PageView) nodes

Why even the fastest of this three methods is so slow? (I guess that all my data is in RAM)
What is the best way to filter paths with loops?

mif
  • 581
  • 6
  • 16

1 Answers1

0

By specifying labels for all identifiers, you force Cypher to open the node headers and filter all labels in it.

This is where the names of your relationships are important. Relationships are made to drive you into the graph, for performance there would be no need to specify the labels, so if your sure all nodes along the Path have the Pageview label, just omit it except for the start of your query :

match (p1:PageView)-[:Next]->(p2)-[:Next]->(p3)
where p1.Page<>p2.Page and p1.Page<>p3.Page and p2.Page<>p3.Page
RETURN [p1.Page,p2.Page,p3.Page], count(*) as count
ORDER BY count DESC
LIMIT 10;

I posted some query plan results in this answer related to your question : Neo4j: label vs. indexed property?

Community
  • 1
  • 1
Christophe Willemsen
  • 19,399
  • 2
  • 29
  • 36
  • Thanks for your answer! But your solution save 4 or maybe 5 sec. Why traversal such small graph in RAM is so slow! My question was about why that code executing about 20 secs, and not about 1. Can I see what stage of query is the slowest? – mif Jan 31 '15 at 07:31