Queries on 200 GB graph

Question

I am in need to use a scalable solution to create a Geohash connected graph.

I find Cypher for APache Spark a project that let use cypher on spark dataframes to create a graph, however it can only create immutable graphs by mapping the different data-frames,so i didn't get the graph that i need.

I can get the graph that i need if i run some other cypher queries on a Neo4j Browser, however my stored graph is about 200 GB.

So i'm asking if that logic and fast to run queries on 200 GB of graph data using Neo4j browser and apoc functions ?

your problem looks "about how much resources you need" to query in your 200GB of data. I don't know if you have a server or a local machine (add more RAM and CPU could be usefult, but no is scalable at certain time). Maybe you need distribute your data over some instances to dealing with the size of your graph, or store the data using a hadoop cluster or something like it. run this kind of queries in a large graph over "neo4j browser", could not be a good idea, specially if you use the "graph view" with a lot of nodes and relations, you should use the text view — juanbits, Aug 20 '18 at 19:19
the graph could be stored in AWS server , so the question is about fastness of the queries — A.HADDAD, Aug 20 '18 at 19:36
So in that case, you only need to be sure that your queries are optimized — juanbits, Aug 20 '18 at 19:37
we need an Open source solution in our project so the commercial amazon neptune isn't a solution for us — A.HADDAD, Aug 20 '18 at 19:45

InverseFalcon · Accepted Answer · 2018-08-20T20:15:32.763

If you're asking if Neo4j can handle databases of this size, then the answer is yes. But you'll see different results depending on how your data is modeled and the kind of queries you want to run.

Performance correlates not necessarily with the size of the graph, but on the portion of the graph touched and traversed by your queries. Graph-wide analytical queries must touch the entire graph, while tightly defined queries that touch a smaller local part of the graph will be quite quick.

Anything you can do in your queries to constrain the portion of the graph you have to traverse or filter will help out your query speed, so good modeling and usage of indexes and constraints is key.

Queries on 200 GB graph

1 Answers1