4

I'm trying to build this model in Neo4j http://blog.neo4j.org/2012/02/modeling-multilevel-index-in-neoj4.html

Now i need a query to find out if a timeline path already exist. The problem is that i can't seem to make a node optional and at the same time check if a property exist.

Ideally i would like this query:

START a=node(2)
MATCH a:timeline -[?]-> b:time_year -[?]-> c:time_month -[?]-> d:time_day
WHERE b.year = 2013 AND c.month = 7 AND d.day = 28
RETURN b, c, d

But since labels on optional nodes don't seem to be supported i'm going with this query:

START a=node(2)
MATCH a:timeline -[?]-> b -[?]-> c -[?]-> d
WHERE b.year = 2013 AND c.month = 7 AND d.day = 28
RETURN b, c, d

Without the WHERE clause this would return:

a    b    c
null null null

The result is fine because then i know i have to make a year, month and day node. But without the WHERE clause i can't specify the date i want rendering the entire query useless.

I'm using Neo4j 2.0.0-M03

UPDATE: To clarify why the labels are not working. This is run on a fresh database.

In console:

neo4j-sh (0)$ CREATE (n:timeline) RETURN n;
==> +-----------+
==> | n         |
==> +-----------+
==> | Node[1]{} |
==> +-----------+
==> 1 row
==> Nodes created: 1
==> Labels added: 1
==> 967 ms
neo4j-sh (0)$ START a=node(1) MATCH a:timeline -[?]-> b:time_year -[?]-> c:time_month -[?]-> d:time_day WHERE b.year = 2013 AND c.month = 7 AND d.day = 28 RETURN b as year, c as month, d as day;
==> Unrecognized option '['

In data browser:

START a=node(1) MATCH a:timeline -[?]-> b:time_year -[?]-> c:time_month -[?]-> d:time_day WHERE b.year = 2013 AND c.month = 7 AND d.day = 28 RETURN b as year, c as month, d as day;
Returned 0 rows. Query took 138ms
Not found
There is no data matching your query in the database.

I just found out that these queries work in my code, but not in the Neo4j console or data browser. I assumed those would be "flawless" and thus didn't test these queries in my code before. Also it's weird that the console and data browser give different results. Choosing node(*) at start instead of node(1) doesn't make a difference.

Update 2: I played around a little more with the example gists as posted by Peter Neubauer. The problem is that this example returns every or nothing. While i want to make the returned columns optional. So in this example i would like Query3 to return:

Columns: year, month, day
Data: 2010, null, null

But it returns:

Query took 264 ms and returned no rows.

When i make the properties optional like this:

START a=node(*) 
MATCH a:timeline -[?]-> b:time_year -[?]-> c:time_month -[?]-> d:time_day 
WHERE b.year? = 2010 AND c.month? = 1 AND d.day? = 1 
RETURN b AS year, c AS month, d AS day

I get (on the gists website):

 Error: java.lang.NullPointerException

Here is the kicker: This only happens on firefox, on chrome this returns: Query took 1 ms and returned no rows.

Then this query:

START a=node(*) 
MATCH a:timeline -[?]-> b -[?]-> c -[?]-> d 
WHERE b.year? = 2010 AND c.month? = 1 AND d.day? = 1 
RETURN b AS year, c AS month, d AS day

Returns:

Columns: year, month, day
Data: (9:time_year {name:"Year 2010", year:2010}), [empty table cell], [empty table cell]

This is the result i want (but then with not making use of labels)

Then the same query on Chrome:

Query took 4 ms and returned no rows.

So my conclusion thus far is:
There are 5 different environments giving different results:

  • My own code making using of php4neoj by jadell
  • The web interface with using the data browser
  • The web interface with using the console
  • The gists website http://gist.neo4j.org with using Firefox
  • The gists website http://gist.neo4j.org with using Chrome

I have not tried yet:

  • The console starting with the batch file
  • The gists website http://gist.neo4j.org with using Opera
  • The neo4j console at http://console.neo4j.org/ (i expect this to give the same results as the gists because it looks the same, but i haven't test this yet)

So i have a bunch of variations on the queries and a bunch of environments. Perhaps this could be scriptable in a way to have the query run in the different environments and give back the results. Then i could put the results in a table where one axis is the environment and the other one is the query under test.

Flip
  • 4,778
  • 1
  • 34
  • 48

4 Answers4

1

Do this for each of the optional label constraints to workaround it

CASE
    WHEN b IS NOT NULL AND ANY(x IN LABELS(b) WHERE x="time_year") THEN b
    ELSE NULL 
END AS newB

Since b is any type of node, this will make sure that if it exists, it is the proper label. It's a bit verbose, but oh well.

--edit-- Based on flip's comment about index I did the following: (The execution plans are based on the second time each query is ran)

CREATE INDEX ON :time_year(year);
create 
(_7:timeline ),
(_8:time_year  {year:2010}),
(_9:timeline ),
_9-[:HAS_YEAR]->_8;

First Idea

MATCH a:timeline WITH a
MATCH a-[?]->b 
WITH a, b,  CASE WHEN b IS NOT NULL AND ANY (x IN LABELS(b) 
                                             WHERE x="time_year")  THEN b  ELSE NULL END AS newB 
WHERE newB.year? = 2010 
RETURN a, newB AS year

Detailed Query Results
Query Results

+--------------------------------+
| a         | year               |
+--------------------------------+
| Node[7]{} | Node[8]{year:2010} |
| Node[9]{} |              |
+--------------------------------+
2 rows
2 ms

Execution Plan

ColumnFilter(symKeys=["a", "b", "newB", "year"], returnItemNames=["a", "year"], _rows=2, _db_hits=0)
Extract(symKeys=["a", "b", "newB"], exprKeys=["year"], _rows=2, _db_hits=0)
  Filter(pred="nullable([($anonfun$nullableProperty$3$$anonfun$apply$21$$anon$1,true)],[$anonfun$nullableProperty$3$$anonfun$apply$21$$anon$1 == Literal(2010)])", _rows=2, _db_hits=2)
    ColumnFilter(symKeys=["a", "b", "  UNNAMED33", "newB"], returnItemNames=["a", "b", "newB"], _rows=2, _db_hits=0)
      Extract(symKeys=["a", "b", "  UNNAMED33"], exprKeys=["newB"], _rows=2, _db_hits=0)
        PatternMatch(g="(a)-['  UNNAMED33']-(b)", _rows=2, _db_hits=1)
          PatternMatch(g="", _rows=2, _db_hits=0)
            Filter(pred="hasLabel(a: timeline)", _rows=2, _db_hits=0)
              NodeByLabel(label="timeline", identifier="a", _rows=2, _db_hits=0)

Second Idea

MATCH a:timeline WITH a
MATCH a-[?]->b 
WHERE b.year? = 2010 
RETURN a, b AS year

Detailed Query Results
Query Results

+--------------------------------+
| a         | year               |
+--------------------------------+
| Node[7]{} | Node[8]{year:2010} |
| Node[9]{} |              |
+--------------------------------+
2 rows
2 ms

Execution Plan

ColumnFilter(symKeys=["a", "b", "  UNNAMED33", "year"], returnItemNames=["a", "year"], _rows=2, _db_hits=0)
Extract(symKeys=["a", "b", "  UNNAMED33"], exprKeys=["year"], _rows=2, _db_hits=0)
  Filter(pred="nullable([($anonfun$nullableProperty$3$$anonfun$apply$21$$anon$1,true)],[$anonfun$nullableProperty$3$$anonfun$apply$21$$anon$1 == Literal(2010)])", _rows=2, _db_hits=2)
    PatternMatch(g="(a)-['  UNNAMED33']-(b)", _rows=2, _db_hits=3)
      PatternMatch(g="", _rows=2, _db_hits=0)
        Filter(pred="hasLabel(a: timeline)", _rows=2, _db_hits=0)
          NodeByLabel(label="timeline", identifier="a", _rows=2, _db_hits=0)

What would make this a little cleaner is if we just has a NULLIF function available in Cypher. Though it doesn't look like it uses any indexes when checking the node on the relationship in either case.

LameCoder
  • 1,287
  • 7
  • 22
  • Will that make use of the indexes which have been put? Or will the CASE be executed after a search on the all the nodes has been performed ? – Flip Aug 09 '13 at 05:40
  • Are you asking if it would use the index on the Label or on the property? I'm guessing that even if non-optional, it would have to do a WHERE on the label first. – LameCoder Aug 09 '13 at 13:12
  • With the execution plans you posted i now know that i am not loosing performance using my original solution. Since i don't actually rely on labels for indentification in my current scheme (the next node always has the same label), for my use case then the tradeoff is merely down to the clarity of the query. In which case i will go with the solution i had. Thanks for your investigation, i'll accept your answer. And yes let's hope that in the future it will make use of optional labels. – Flip Aug 10 '13 at 12:39
0

Got the answer from this page: http://grokbase.com/t/gg/neo4j/137qbdyn14/use-label-in-start

START a=node(2)
MATCH a:timeline -[?]-> b -[?]-> c -[?]-> d
WHERE b.year? = 2013 AND c.month? = 7 AND d.day? = 28
RETURN b, c, d

However the additional question about labels remains a mystery for now ..

Flip
  • 4,778
  • 1
  • 34
  • 48
0

The correct answer to this question is that this feature, or the lack thereof (bug?), is not available in Neo4j 2.0.0 Milestone 03. But has now been added to Neo4j 2.0.0 Milestone 05. Labels on optional nodes no longer stop the whole MATCH clause from returning results.

Source: https://github.com/neo4j/neo4j/blob/master/packaging/standalone/standalone-community/src/main/distribution/text/community/CHANGES.txt

(I have not tested this yet)

Flip
  • 4,778
  • 1
  • 34
  • 48
-1

I made a graph gist on this, see http://gist.neo4j.org/?6113785 and it seems the labels ARE working. Wanna contribute and clarify?

Peter Neubauer
  • 6,311
  • 1
  • 21
  • 24