1

We are investigating deeper CosmosDb GraphDb and it looks like the price for simple queries is very high.

This simple query which returns 1000 vertices with 4 properties each cost 206 RU (All vertices live on the same partition key and all documents have index) :

g.V('0').out()

0 is the id of the vertex

No better result with the long query (208 RU)

g.V('0').outE('knows').inV()

Are we doing something wrong or is it expecting price ?

draco951
  • 226
  • 3
  • 13

2 Answers2

1

I have been working with CosmosDb Graph also and I am still trying to gain sensibility towards the RU consumption just like you.

Some of my experiences may be relevant for your use case:

  1. Adding filters to your queries can restrict the engine from scanning through all the available vertices.

  2. .dedup() performed small miracles for me. I faced a situation where I had two vertices A and B connected to C and in turn C connected to other irrelevant vertices. By running small chunks of my query and using .executionProfile() I realized that when executing the first step of my traversal, where I get the vertices to which A and B connect to, C would show up twice. This means that the engine, when processing my original query would go through the effort of computing C twice when continuing to the other irrelevant vertices. By adding a .dedup() step here, I effectively reduced the results from the first step from two C records to a single one. Considering this example, you can imagine that the irrelevant vertices may also connect to other vertices, so duplicates can show on each step.

  3. You may not always need every single vertex that you have in the output of a query, using the .range() step to limit the results to a defined amount that suits your needs may also reduce RUs.

  4. Be aware of the known limitations presented by MS.

I hope this helps in some way.

Rúben B.
  • 11
  • 1
  • 2
  • doing a of .range(0,1) or .range(0,100) cost nearly the same RU (55 RU) then the cost grow up. Does 55 RU seems cheap for a simple query as that ? – draco951 Sep 09 '22 at 11:12
  • Considering that a single point read (e.g. `g.V('0')`) costs something like 3 or 4 RUs, 55 RU to return a collection feels acceptable to me. Allow me to reinforce the usage of `.executionProfile()` you can really pinpoint where your traversal is doing the heavy lifting. Based on your comment, I can imagine that you experimented something like `g.V('0').outE('knows').inV().range(0, 50)`. I would guess that for this case, the effort is likely to be in the `.outE('knows')`. Since the engine would need to scan for `Edge` records which hold the `'knows'` label. – Rúben B. Sep 09 '22 at 14:13
0

Instead of returning complete vertices you can try and only return the vertex properties you need using the gremlin .project() step. Note that CosmosDB does not seem to support other gremlin steps to retrieve just some properties of a vertex.

HadoopMarc
  • 1,356
  • 3
  • 11
  • The following query gives the exact same result (206 RU): `g.V('0').out().project('name1','name2').by('name1').by('name2')` – draco951 Jul 26 '22 at 12:22