Counting distinct nodes in path

Question

I have a graph with customers, transactions, and merchants with connections that look as follows:

(Customer)->(Transaction)->(Merchant).

I'm trying to effectively return a new graph that connects the various merchants by the number of shared distinct customers (customers that have transacted with both merchants), which can be interpreted as the number of distinct customer nodes in all paths between a first Merchant node to a second Merchant node. Unfortunately, from what I can tell this is prohibitively expensive to do in Neo4j. To give you an idea of what I'm trying to do, here are some queries that I've tried using to get this done:

MATCH (m1:Merchant)<-[:TRANSACTION_WITH_MERCHANT]-()<-[:CUSTOMER_MADE_TRANSACTION]-(c)-[:CUSTOMER_MADE_TRANSACTION]->()-[:TRANSACTION_WITH_MERCHANT]->(m2:Merchant)
RETURN m1, m2, count(distinct c)

MATCH (m1:Merchant), (m2:Merchant)
WHERE id(m1)<id(m2)
MATCH p=(m1)<-[:TRANSACTION_WITH_MERCHANT]-()<-[:CUSTOMER_MADE_TRANSACTION]-(c)-[:CUSTOMER_MADE_TRANSACTION]->()-[:TRANSACTION_WITH_MERCHANT]->(m2)
RETURN m1, m2, count(distinct c) as n_connections

I realize these queries are pretty nasty because of all the cartesian products, and the huge number of paths that need to be explored when customers have a lot of transactions. Are there any tricks to avoid exploring paths that pass through the same customer? Would creating a graph going directly from customer to merchants they've transacted with be best?

I appreciate any suggestions.

score 0 · Accepted Answer · answered Jan 22 '20 at 00:54

Solved my problem well enough to execute the query (although I'm sure there are better solutions. Long story short, I did two things:

Created a relationship directly between customers and merchants with a property indicating how many transactions that customer has with that merchant.
Removed merchants that had a low number of transactions in order to make the cross product more manageable (didn't see a way around this).

Creating an edge between customers and merchants (already only including merchants with more than 40 transactions).

match (m:Merchant)<-[t:TRANSACTION_WITH_MERCHANT]-()
with m, count(t) as nr_trans
where nr_trans > 40
match (c: Customer)-[:CUSTOMER_MADE_TRANSACTION]->(t:Transaction)-[:TRANSACTION_WITH_MERCHANT]->(m)
with c, m, count(t) as nr_transactions_with_merchant
merge (c)-[:CUSTOMER_TRANSACTED_WITH_MERCHANT {nr_transactions:nr_transactions_with_merchant}]->(m);

Dropping edges and merchants with a low number of transactions:

match (m:Merchant)<-[:TRANSACTION_WITH_MERCHANT]-(t:Transaction)
with m, count(t) as nr_transactions_with_merchant
where nr_transactions_with_merchant <= 40
match (m)<-[e]-()
delete e
delete m;

New query that runs!

MATCH (m1:Merchant), (m2:Merchant)
WHERE id(m1)<id(m2)
MATCH p=(m1)<-[:CUSTOMER_TRANSACTED_WITH_MERCHANT]-(c)-[:CUSTOMER_TRANSACTED_WITH_MERCHANT]->(m2)
RETURN m1, m2, count(c) as n_connections

Counting distinct nodes in path

1 Answers1