11

What is the easiest & most efficient way to count the number of nodes/edges in a large graph via Gremlin? The best I have found is using the V iterator:

gremlin> g.V.gather{it.size()}

However, this is not a viable option for large graphs, per the documentation for V:

The vertex iterator for the graph. Utilize this to iterate through all the vertices in the graph. Use with care on large graphs unless used in combination with a key index lookup.

bcm360
  • 1,437
  • 2
  • 17
  • 25

3 Answers3

10

I think the preferred way to do a count of all vertices would be:

gremlin> g = TinkerGraphFactory.createTinkerGraph()
==>tinkergraph[vertices:6 edges:6]
gremlin> g.V.count()
==>6
gremlin> g.E.count()
==>6

though, I think that on a very large graph g.V/E just breaks down no matter what you do. On a very large graph the best option for doing a count is to use a tool like Faunus(http://thinkaurelius.github.io/faunus/) so that you can leverage the power of Hadoop to do the counts in parallel.

UPDATE: The original answer above was for TinkerPop 2.x. For TinkerPop 3.x the answer is largely the same and implies use of Gremlin Spark or some provider specific tooling (like DSE GraphFrames for DataStax Graph) that is optimized to do those kinds of large scale traversals.

stephen mallette
  • 45,298
  • 5
  • 67
  • 135
  • Great, thank you! I was thinking the metadata might be tracked & accessible somewhere, but Faunus does sound like a solid alternative. – bcm360 Jun 20 '13 at 16:19
  • I don't know that any graphs track that as metadata explicitly, though if any did it would be specific to the operations of the graph itself. There is nothing in Blueprints/Gremlin that gets at that count directly, nor does Blueprints expose metadata in any way. If you found that a graph implementation did have metadata to get you this information you could likely access it by getting the underlying graph with `getRawGraph()`. – stephen mallette Jun 20 '13 at 17:02
  • Is it possible to run both g.V.count() and g.E.count() in a single query, and consume the result of both? – EasyQuestions Jun 09 '20 at 20:44
  • 1
    yes, it can be done but note that the same warnings as mentioned in this answer will still apply - only doubly (or more) so because now you're not only reading every vertex but also every edge in the graph. Anyway, a proper answer requires more than what should be written in a stackoverflow comment. it deserves its own question and answer - if you create one and link here i'd be happy to answer it. – stephen mallette Jun 09 '20 at 21:48
1

I tried the above, it didn't work for me. For some of you, this may work:

gremlin> g.V.count()
{"detailedMessage":"Query parsing failed at line 1, character position at 3, error message : no viable alternative at input 'g.V.'","code":"MalformedQueryException","requestId":"99f749db-c240-9834-aa12-e17bb21e598e"}
Type ':help' or ':h' for help.
Display stack trace? [yN]
gremlin> g.V().count()
==>37
gremlin> g.E().count()
==>45
gremlin> 

Use g.V().count instead of g.V.count(). (For those where the other command errors out).

Hari Kishore
  • 2,201
  • 2
  • 19
  • 28
0

via python:

from gremlin_python.structure.graph import Graph
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection

graph = Graph()
graph_db_uri = 'ws://localhost/gremlin'

      
g = graph.traversal().withRemote(DriverRemoteConnection(graph_db_uri,'g'))
count=g.V().hasLabel('node_label').count().next()
print("vertex count: ",count)

count=g.E().hasLabel('edge_label').count().next()
print("edge count: ",count)