I have a graph with customers, transactions, and merchants with connections that look as follows:
(Customer)->(Transaction)->(Merchant).
I'm trying to effectively return a new graph that connects the various merchants by the number of shared distinct customers (customers that have transacted with both merchants), which can be interpreted as the number of distinct customer nodes in all paths between a first Merchant node to a second Merchant node. Unfortunately, from what I can tell this is prohibitively expensive to do in Neo4j. To give you an idea of what I'm trying to do, here are some queries that I've tried using to get this done:
MATCH (m1:Merchant)<-[:TRANSACTION_WITH_MERCHANT]-()<-[:CUSTOMER_MADE_TRANSACTION]-(c)-[:CUSTOMER_MADE_TRANSACTION]->()-[:TRANSACTION_WITH_MERCHANT]->(m2:Merchant)
RETURN m1, m2, count(distinct c)
MATCH (m1:Merchant), (m2:Merchant)
WHERE id(m1)<id(m2)
MATCH p=(m1)<-[:TRANSACTION_WITH_MERCHANT]-()<-[:CUSTOMER_MADE_TRANSACTION]-(c)-[:CUSTOMER_MADE_TRANSACTION]->()-[:TRANSACTION_WITH_MERCHANT]->(m2)
RETURN m1, m2, count(distinct c) as n_connections
I realize these queries are pretty nasty because of all the cartesian products, and the huge number of paths that need to be explored when customers have a lot of transactions. Are there any tricks to avoid exploring paths that pass through the same customer? Would creating a graph going directly from customer to merchants they've transacted with be best?
I appreciate any suggestions.