0

Can a group count query in Amazon Neptune or any Graph Databases fail due to Big Data ?

I mean if the counts exceeds the limits of the count datatype can there be a n overflow?

1 Answers1

1

Short answer

Gremlin query language semantics (as defined by the Tinkerpop code) define output of count() function as a 64 bit long. So, yes, count cannot exceed the range of long.

Long answer

Having said that, let's try to calculate the amount of data you would need to insert into the DB to hit that threshold. Each entity(Vertex/Edge/Property) in the DB contains a unique ID associated with it. Let us hypothetically assume that the storage of each entity consists of just the identifier. Also, let us assume that the data type of the identifier is the most efficient, i.e. a long (and not a String which would use greater space than a long).

To hit the limit of count, the DB would need to store at least 2^64 entities each with a unique identifier i.e. at least ((2^64)*64)bits of data i.e. greater than 1000 PetaBytes of data at a very conservative estimate.

The point is, you would need to store a huge amount of data before you hit the limit of count. If you are operating with such amount of data, a DB might not be right storage solution for you.

Divij Vaidya
  • 261
  • 1
  • 4