Injective Matching in Neo4j

Question

I'm looking for an efficient way to do injective matching in Neo4j. If you're not sure what I mean by that; I simply want matches to be returned where every returned node in a match is unique (e.g. has a unique ID), and the same holds for paths.

Using the diagram from Wikipedia (above), the domain, X, of matching is the nodes and paths in the pattern, and the co-domain, Y, is the Database's internal Graph. The above diagram is injective as no 2 arrows from X point to the same element in Y (so no 2 nodes from the pattern are matched to the same node in the Graph, and the same holds for edges), whereas default Neo4J matching is non-injective and allows 2 nodes from the pattern to be matched to the same node in the Graph (you can visualise an example of non-injective matching as the arrows from 1 and 2 in X both pointing to D in Y in the above diagram). Conventional Graph Theory would call matching multiple items from the domain X to the same item in the co-domain Y "merging", but I can appreciate that that terminology may be confusing in this context.

I can simulate this for specific queries by specifying that matched nodes are distinct:

match (a), (b) where not id(a) = id(b) return a, b

But I want to do it in a general sense without having to be this explicit in every query. So for this example I would like to return matches where (a) and (b) are unique nodes but I would like to do this with some general behaviour rather than specifying the uniqueness based on ID.

It does seem that paths are already guaranteed to be unique when I query, but if someone could confirm that that would be great.

I think one reason is that a few terms you're using are already used in Neo4j and mean different things. For example, "where matched nodes and paths are not merged" is rather confusing for someone who is familiar with Neo4j terminology for "merge" (in Neo4j MERGE is to create or match to graph elements...a MATCH if it exists, a CREATE if it does not). Can you describe that requirement in different terms? — InverseFalcon, Apr 05 '17 at 10:50
@InverseFalcon A fair comment, the only term I can think of that will work is that matched nodes and paths are unique (distinct would work but obviously that is used for a different purpose in queries). — Tim Atkinson, Apr 05 '17 at 10:54

Tomaž Bratanič · Answer 1 · 2017-03-29T15:46:16.540

1

I am not really sure what exactly do you want, but I can upgrade your query to be more efficient.

Match (a),(b) where id(a) < id(b)
Return a,b

If you want to return distinct nodes or relationships cypher has a distinct function. Example:

Match (a)-->(b)
Return distinct(a)

P.s. always use labels for nodes as it speeds up query execution

edited Mar 29 '17 at 15:46

answered Mar 29 '17 at 15:41

Tomaž Bratanič

6,319
2
18
31

So Match (a), (b) return distinct(a) distinct(b) will ensure that a and b are matched to separate nodes? If so, this is exactly what I was looking for – Tim Atkinson Mar 29 '17 at 16:09
no unfortunately not. It just takes a list and remove duplicates from it. What you want to use is the first query i wrote... can you be more specific what is the use-case – Tomaž Bratanič Mar 29 '17 at 16:13
Unfortunately not really; I'm trying to build a very general model and right now want to set up injective matching in a general sense. If you're unsure what i mean by injective matching: http://www.mathsisfun.com/sets/injective-surjective-bijective.html If I have a set of nodes (v1, v2, ... vn) and edges (e1, e2, .. em) I want to find all valid matches for that set of nodes and edges where no two nodes or edges have been merged. I can generate a where clause as you described for each pair but that blows up quadratic-ally as the nodes/edges I want to match increase in size – Tim Atkinson Mar 29 '17 at 16:17

score 1 · Accepted Answer · answered Apr 05 '17 at 11:10

1

If you just want to make sure the nodes of the variables in your match are not the same, you can install APOC Procedures and make use of some of collection helper functions, specifically apoc.coll.containsDuplicates().

An example usage might look like:

MATCH p1=(a)-[r]->(b), (c)
WHERE NOT apoc.coll.containsDuplicates([a, b, c])
RETURN a, b, c, p1

answered Apr 05 '17 at 11:10

InverseFalcon

29,576
4
38
51

trying to give it a go but I'm struggling to get APOC installed properly (my neo4j.conf is missing -.-). I'll come back to this once I've tested but it does seem to be right :) – Tim Atkinson Apr 05 '17 at 11:28
You shouldn't need to mess with the conf file to use APOC, just drop the jar in the plugins folder and restart neo4j. The trickiest part is just downloading the correct APOC version according to your Neo4j version, there should be a version matrix to help. – InverseFalcon Apr 05 '17 at 11:34
Ah, I got it working! The issue was that I hadn't fiddled with the conf file at all so Neo4j wasn't pointing at my plugins directory at all. This seems to work perfectly but I'll keep my answer up as legacy (in case APOC support stops or whatever). As a note I think it might be more efficient to focus on IDs e.g. NOT apoc.coll.containsDuplicates([id(a), id(b), id(c)]) but I'm honestly not sure. – Tim Atkinson Apr 05 '17 at 11:36
The nodes that are passed in are lightweight objects, the function under the hood is creating a set from the given collection (where the node id is the basis for equality/hashcode), and comparing the sizes of the collection. If the set is smaller, it returns true that there are duplicates. – InverseFalcon Apr 05 '17 at 11:48
Ah nice, ignore my thought then! Thanks for the good answer and discourse :) – Tim Atkinson Apr 05 '17 at 11:51
Not a problem, glad to help! – InverseFalcon Apr 05 '17 at 12:01

score 0 · Answer 3 · edited May 23 '17 at 12:25

0

Edit: @InverseFalcon has given a better solution using the APOC plugin but if you cannot use APOC for whatever reason or APOC support stops then you can check the uniqueness of IDs manually as described here.

Paths are unique by design so injective matching can be achieved by simple ensuring that nodes are unique. Taking inspiration from this answer, apparently you cannot wrap a query to achieve this, but a short ALL statement in the WHERE clause at least simplifies the id checks.

If the pattern you want to match is described by:

MATCH p1=(a)-[r]->(b), (c)
RETURN a, b, c, p1

Which is a very simple (any 3 nodes with any relationship from the 1st to the second) pattern, then you can ensure the uniqueness of those 3 nodes by checking each node's ID against the others:

MATCH p1=(a)-[r]->(b), (c)
WHERE ALL(n in [a, b, c] where 
          1=length(filter(m in [a, b, c] where id(m)=id(n))))
RETURN a, b, c, p1

This works by checking each node in the set of nodes we wish to be unique ([a, b, c]) and comparing its ID against every other node's ID in that set, making sure that there is only 1 matching ID (itself) in the set. This is the reason for the "where 1=length()" part.

So using this idea generally, returned nodes are guaranteed to be unique and returned paths from the pattern are guaranteed to be unique (by Neo4j's design) making the matching process injective.

It's not an ideal solution as the statement must be custom-written from the query but its at least a single WHERE condition which grows linearly to the number of nodes, instead of creating a condition for each pair of nodes where the number of conditions grows by a factorial order to the number of nodes.

edited May 23 '17 at 12:25

Community

1
1

answered Apr 05 '17 at 10:10

Tim Atkinson

83
7

Will mark this as the answer unless a better one shows up – Tim Atkinson Apr 05 '17 at 10:11
This almost looks like you're trying to find all combinations of nodes of a certain set size, with no repeats. Is that close? – InverseFalcon Apr 05 '17 at 10:32
In this specific example yes, but more generally I just want to return matches with no merged nodes or paths, and making sure there are no repeats in ID guarantees this. This should work for matching generally by putting all of the nodes from any arbitrary pattern in the set inside the ALL clause. – Tim Atkinson Apr 05 '17 at 10:37
Again, I want to use Neo4j's matching function but I want it to operate injectively (see http://www.mathsisfun.com/sets/injective-surjective-bijective.html) whereas it by default does not (nor is it surjective nor bijective but you can ignore those for this context). – Tim Atkinson Apr 05 '17 at 10:40
@InverseFalcon I added a path to the example to make it clearer that I want to do this for general pattern matching – Tim Atkinson Apr 05 '17 at 10:47
Just to note, you'll probably want to profile that kind of match. This could be doing a cartesian product of a, b, and c, and only afterwards filtering based on the second part of the match with the relationship. Better to not repeat nodes that you use in the relationship, like so: `MATCH p1=(a)-[r]->(b), (c)`. Also, if you have labels in your graph, I'd recommend using those to improve your query performance, otherwise all nodes in the graph will be considered. – InverseFalcon Apr 05 '17 at 11:04
@InverseFalcon you are right with un-needed repeated nodes, I've removed them accordingly. Regarding profiling and labels, you are definitely right for this example but I've left those ideas out because I'm trying to boil the solution down to the injective matching (as it is only an example) – Tim Atkinson Apr 05 '17 at 11:11

Injective Matching in Neo4j

3 Answers3