0

I'm having a bit of a problem removing duplicate paths from a return in Neo4j.

Here is the cypher query:

MATCH (subA:`Subaccount` {acct: 199345})-[:`contains`]->(intA:`Physical Interface`)-[conn:`connects`]-(intB:`Physical Interface`)<-[:`contains`]-(subB:`Subaccount`)
RETURN DISTINCT id(conn), subA.name, intA.name, intB.name AS `Connected Host Interface`, subB.name AS `Connected Host`, subB.acct AS `Connected Host Account`
ORDER BY id(conn);

This returns the following result set:

+-----------------------------------------------------------------------------------------------------------------------------------+
| id(conn) | subA.name           | intA.name     | Connected Host Interface | Connected Host               | Connected Host Account |
+-----------------------------------------------------------------------------------------------------------------------------------+
| 30832    | "fw.rrfaae.com"     | "ethernet0/0" | "Gi0/1"                  | "b3s5r12k11s1.liquidweb.com" | 22                     |
| 30833    | "fw.rrfaae.com"     | "ethernet0/1" | "Gi0/48"                 | "sw.rrfaae.com"              | 199345                 |
| 30833    | "sw.rrfaae.com"     | "Gi0/48"      | "ethernet0/1"            | "fw.rrfaae.com"              | 199345                 |
| 30834    | "dummy2.rrfaae.com" | "eth0"        | "Gi0/2"                  | "b3s5r12k11s1.liquidweb.com" | 22                     |
| 30835    | "dummy1.rrfaae.com" | "eth0"        | "Gi0/3"                  | "b3s5r12k11s1.liquidweb.com" | 22                     |
| 30836    | "dummy1.rrfaae.com" | "eth1"        | "eth1"                   | "dummy2.rrfaae.com"          | 199345                 |
| 30836    | "dummy2.rrfaae.com" | "eth1"        | "eth1"                   | "dummy1.rrfaae.com"          | 199345                 |
| 30837    | "sw.rrfaae.com"     | "Gi0/1"       | "eth0"                   | "web01.rrfaae.com"           | 199345                 |
| 30837    | "web01.rrfaae.com"  | "eth0"        | "Gi0/1"                  | "sw.rrfaae.com"              | 199345                 |
| 30838    | "sw.rrfaae.com"     | "Gi0/2"       | "eth1"                   | "web01.rrfaae.com"           | 199345                 |
| 30838    | "web01.rrfaae.com"  | "eth1"        | "Gi0/2"                  | "sw.rrfaae.com"              | 199345                 |
| 30840    | "sw.rrfaae.com"     | "Gi0/4"       | "eth1"                   | "web02.rrfaae.com"           | 199345                 |
| 30840    | "web02.rrfaae.com"  | "eth1"        | "Gi0/4"                  | "sw.rrfaae.com"              | 199345                 |
| 30841    | "sw.rrfaae.com"     | "Gi0/3"       | "eth0"                   | "web02.rrfaae.com"           | 199345                 |
| 30841    | "web02.rrfaae.com"  | "eth0"        | "Gi0/3"                  | "sw.rrfaae.com"              | 199345                 |
| 30842    | "db01.rrfaae.com"   | "serial"      | "serial"                 | "db02.rrfaae.com"            | 199345                 |
| 30842    | "db02.rrfaae.com"   | "serial"      | "serial"                 | "db01.rrfaae.com"            | 199345                 |
| 30843    | "sw.rrfaae.com"     | "Gi0/5"       | "eth0"                   | "db01.rrfaae.com"            | 199345                 |
| 30843    | "db01.rrfaae.com"   | "eth0"        | "Gi0/5"                  | "sw.rrfaae.com"              | 199345                 |
| 30844    | "sw.rrfaae.com"     | "Gi0/6"       | "eth1"                   | "db01.rrfaae.com"            | 199345                 |
| 30844    | "db01.rrfaae.com"   | "eth1"        | "Gi0/6"                  | "sw.rrfaae.com"              | 199345                 |
| 30845    | "db02.rrfaae.com"   | "eth2"        | "eth2"                   | "db01.rrfaae.com"            | 199345                 |
| 30845    | "db01.rrfaae.com"   | "eth2"        | "eth2"                   | "db02.rrfaae.com"            | 199345                 |
| 30846    | "sw.rrfaae.com"     | "Gi0/7"       | "eth0"                   | "db02.rrfaae.com"            | 199345                 |
| 30846    | "db02.rrfaae.com"   | "eth0"        | "Gi0/7"                  | "sw.rrfaae.com"              | 199345                 |
| 30847    | "sw.rrfaae.com"     | "Gi0/8"       | "eth1"                   | "db02.rrfaae.com"            | 199345                 |
| 30847    | "db02.rrfaae.com"   | "eth1"        | "Gi0/8"                  | "sw.rrfaae.com"              | 199345                 |
+-----------------------------------------------------------------------------------------------------------------------------------+

The one possible solution I was looking at recommend using a WHERE id(a) < id(b) type clause.

If subA wasn't being matched based on a property, this would work. However, since it is, some results that should return wouldn't:

Query:

MATCH (subA:`Subaccount` {acct: 199345})-[:`contains`]->(intA:`Physical Interface`)-[conn:`connects`]-(intB:`Physical Interface`)<-[:`contains`]-(subB:`Subaccount`)
WHERE id(subA) < id(subB)
RETURN DISTINCT id(conn), subA.name, intA.name, intB.name AS `Connected Host Interface`, subB.name AS `Connected Host`, subB.acct AS `Connected Host Account`
ORDER BY id(conn);

Result set:

+--------------------------------------------------------------------------------------------------------------------------+
| id(conn) | subA.name           | intA.name     | Connected Host Interface | Connected Host      | Connected Host Account |
+--------------------------------------------------------------------------------------------------------------------------+
| 30833    | "fw.rrfaae.com"     | "ethernet0/1" | "Gi0/48"                 | "sw.rrfaae.com"     | 199345                 |
| 30836    | "dummy1.rrfaae.com" | "eth1"        | "eth1"                   | "dummy2.rrfaae.com" | 199345                 |
| 30837    | "sw.rrfaae.com"     | "Gi0/1"       | "eth0"                   | "web01.rrfaae.com"  | 199345                 |
| 30838    | "sw.rrfaae.com"     | "Gi0/2"       | "eth1"                   | "web01.rrfaae.com"  | 199345                 |
| 30840    | "sw.rrfaae.com"     | "Gi0/4"       | "eth1"                   | "web02.rrfaae.com"  | 199345                 |
| 30841    | "sw.rrfaae.com"     | "Gi0/3"       | "eth0"                   | "web02.rrfaae.com"  | 199345                 |
| 30842    | "db01.rrfaae.com"   | "serial"      | "serial"                 | "db02.rrfaae.com"   | 199345                 |
| 30843    | "sw.rrfaae.com"     | "Gi0/5"       | "eth0"                   | "db01.rrfaae.com"   | 199345                 |
| 30844    | "sw.rrfaae.com"     | "Gi0/6"       | "eth1"                   | "db01.rrfaae.com"   | 199345                 |
| 30845    | "db01.rrfaae.com"   | "eth2"        | "eth2"                   | "db02.rrfaae.com"   | 199345                 |
| 30846    | "sw.rrfaae.com"     | "Gi0/7"       | "eth0"                   | "db02.rrfaae.com"   | 199345                 |
| 30847    | "sw.rrfaae.com"     | "Gi0/8"       | "eth1"                   | "db02.rrfaae.com"   | 199345                 |
+--------------------------------------------------------------------------------------------------------------------------+

In this case, b3s5r12k11s1.liquidweb.com has a lower ID than of the devices connected to it, but won't show up in subA due to it having it's acct property as 22.

It seems the DISTINCT qualifier in the RETURN statement works, but only if the only column is id(conn).

Any help would be greatly appreciated!

-Jason

Community
  • 1
  • 1
  • Is the problem that `subA` binds some, but not all, nodes that are also bound as `subB`, because some but not all `subB` have `acct=199345`, and `id(subA) 199345`? – jjaderberg Jul 16 '14 at 09:02
  • The problem is duplicate tuples are being returned (albeit the tuples are in a different order). It's perfectly ok, though, for subB.acct to be 199345 – jgillmanjr Jul 16 '14 at 12:31
  • I think if you include results where `subB.acct=199345` you'll get duplicates since your pattern is symmetric (the same path will be matched in both directions--duplicate tuples in different order). The cases where a `:Subaccount` has `acct=199345` is already covered under `subA`, you'll get all the valid paths. – jjaderberg Jul 16 '14 at 12:58
  • Actually, @ulkas updated query gave me some ideas, and it seems to have worked – jgillmanjr Jul 16 '14 at 13:22

2 Answers2

0

i believe when using the condition id(subA) < id(subB), you must ommit the relationship direction:

MATCH (subA:`Subaccount` {acct: 199345})-[:`contains`]-(intA:`Physical Interface`)-[conn:`connects`]-(intB:`Physical Interface`)-[:`contains`]-(subB:`Subaccount`)
WHERE id(subA) < id(subB)
RETURN DISTINCT id(conn), subA.name, intA.name, intB.name AS `Connected Host Interface`, subB.name AS `Connected Host`, subB.acct AS `Connected Host Account`
ORDER BY id(conn);

edit:

than you can use quite a longer statement with WITH:

MATCH (subA:`Subaccount` {acct: 199345})-[:`contains`]->(intA:`Physical Interface`)-[conn:`connects`]-(intB:`Physical Interface`)<-[:`contains`]-(subB:`Subaccount`)
WITH conn
MATCH (subA:`Subaccount`)-[:`contains`]->(intA:`Physical Interface`)-[conn:`connects`]-(intB:`Physical Interface`)<-[:`contains`]-(subB:`Subaccount`)
WHERE id(subA) < id(subB)
RETURN DISTINCT id(conn), subA.name, intA.name, intB.name AS `Connected Host Interface`, subB.name AS `Connected Host`, subB.acct AS `Connected Host Account`
ORDER BY id(conn);
ulkas
  • 5,748
  • 5
  • 33
  • 47
  • that wouldn't work (and I did test to verify) as in this case even though the node identified by `b3s5r12k11s1.liquidweb.com`has a lower ID than `fw.rrfaae.com`, `dummy2.rrfaae.com`, and `dummy1.rrfaae.com`, it won't show as a match for subA as a result of it not having it's `acct` property being 199345 – jgillmanjr Jul 15 '14 at 14:37
  • The longer statement does indeed seem to do the trick. It seems there would be a more elegant way of doing things though. – jgillmanjr Jul 16 '14 at 12:37
0

I'd still want @ulkas to get credit, as his updated query gave me the idea to try this:

Edit:

MATCH (subA:`Subaccount`)-[:`contains`]->(intA:`Physical Interface`)-[conn:`connects`]-(intB:`Physical Interface`)<-[:`contains`]-(subB:`Subaccount`)
WHERE (subA.acct = 199345 AND id(subA) < id(subB)) OR (subA.acct = 199345 AND subB.acct <> 199345)
RETURN subA.name, intA.name, intB.name, subB.name
ORDER BY subA.name;

Result set:

+-----------------------------------------------------------------------------------------------------------------------------------+
| id(conn) | subA.name           | intA.name     | Connected Host Interface | Connected Host               | Connected Host Account |
+-----------------------------------------------------------------------------------------------------------------------------------+
| 30832    | "fw.rrfaae.com"     | "ethernet0/0" | "Gi0/1"                  | "b3s5r12k11s1.liquidweb.com" | 22                     |
| 30833    | "fw.rrfaae.com"     | "ethernet0/1" | "Gi0/48"                 | "sw.rrfaae.com"              | 199345                 |
| 30834    | "dummy2.rrfaae.com" | "eth0"        | "Gi0/2"                  | "b3s5r12k11s1.liquidweb.com" | 22                     |
| 30835    | "dummy1.rrfaae.com" | "eth0"        | "Gi0/3"                  | "b3s5r12k11s1.liquidweb.com" | 22                     |
| 30836    | "dummy1.rrfaae.com" | "eth1"        | "eth1"                   | "dummy2.rrfaae.com"          | 199345                 |
| 30837    | "sw.rrfaae.com"     | "Gi0/1"       | "eth0"                   | "web01.rrfaae.com"           | 199345                 |
| 30838    | "sw.rrfaae.com"     | "Gi0/2"       | "eth1"                   | "web01.rrfaae.com"           | 199345                 |
| 30840    | "sw.rrfaae.com"     | "Gi0/4"       | "eth1"                   | "web02.rrfaae.com"           | 199345                 |
| 30841    | "sw.rrfaae.com"     | "Gi0/3"       | "eth0"                   | "web02.rrfaae.com"           | 199345                 |
| 30842    | "db01.rrfaae.com"   | "serial"      | "serial"                 | "db02.rrfaae.com"            | 199345                 |
| 30843    | "sw.rrfaae.com"     | "Gi0/5"       | "eth0"                   | "db01.rrfaae.com"            | 199345                 |
| 30844    | "sw.rrfaae.com"     | "Gi0/6"       | "eth1"                   | "db01.rrfaae.com"            | 199345                 |
| 30845    | "db01.rrfaae.com"   | "eth2"        | "eth2"                   | "db02.rrfaae.com"            | 199345                 |
| 30846    | "sw.rrfaae.com"     | "Gi0/7"       | "eth0"                   | "db02.rrfaae.com"            | 199345                 |
| 30847    | "sw.rrfaae.com"     | "Gi0/8"       | "eth1"                   | "db02.rrfaae.com"            | 199345                 |
+-----------------------------------------------------------------------------------------------------------------------------------+