I am working on a little project and I have a dataset of about 60k nodes and 500k relationships between those nodes. The nodes are of two types. First type are are recipes and the second type are ingredients. Recipes are composed of ingredients like:
(ingredient)-[:IS_PART_OF]->(recipe)
My objective is to find how many common ingredients two recipes share. I have managed to obtain this information with the following query that compares one recipe to all others (the first one with all others):
MATCH (recipe:RECIPE{ ID: 1000000 }),(other)
WHERE (other.ID >= 1000001 AND other.ID <= 1057690)
OPTIONAL MATCH (recipe:RECIPE)<-[:IS_PART_OF]-(ingredient:INGREDIENT)- [:IS_PART_OF]->(other)
WITH ingredient, other
RETURN other.ID, count(distinct ingredient.name)
ORDER BY other.ID DESC
My first question: How can I obtain the number of all ingredients of two recipes in a way that the mutual ones are counted only once (union of R1 and R2 --> R1 U R2)
My second question: is it possible to write a loop that would iterate through all the recipes and check for common ingredients? The objective is to compare each recipe with all others. I think this should return (n-1)*(n/2) rows.
I have tried the above and the problem remains. Even with LIMIT
and SKIP
I can not run the code on the whole set. I have changed my query so it allows me to partition my set accordingly:
MATCH (recipe1)<-[:IS_PART_OF]-(ingredient:INGREDIENT)-[:IS_PART_OF]->(recipe2)
WHERE (recipe2.ID >= 1000000 AND recipe2.ID <= 1000009) AND (recipe1.ID >= 1000000 AND recipe1.ID <= 1000009) AND (recipe1.ID < recipe2.ID)
RETURN recipe1.ID, count(distinct ingredient.name) AS MutualIngredients, recipe2.ID
ORDER BY recipe1.ID
Until I get my hands on a better machine this will suffice.
I still haven't solved my first question: how can I obtain the number of all ingredients of two recipes in a way that the mutual ones are counted only once (union of R1 and R2 --> R1 U R2)