0

I am fairly new to graph databases, however I have used SQL Server and document databases (Lucene, DocumentDb, etc.) extensively. It's completely possible that I am approaching this query the wrong way, since I am new to graph databases. I am trying to convert some logic to a graph database (CosmosDB Graph via Gremlins to be specific) that we currently are using SQL Server for. The reason for the change is that this problem set is not really what SQL Server is great at and so our SQL query (which we have optimized as good as we can) is really starting to be the hot spot of our application.

To give a very brief overview of our logic, we run a web shop that allows admins to configure products and users with several levels of granular permissions (described below). Based on these permissions, we show the user only the products they are allowed to see.

Entities:

  • Region: A region consists of multiple countries
  • Country: A country has many markets and many regions
  • Market: A market is a group of stores in a single country
  • Store: A store is belongs to a single market

Users have the following set of permissions and each set can contain multiple values:

  • can-view-region
  • can-view-country
  • can-view-market
  • can-view-store

Products have the following set of permissions and each set can contain multiple values:

  • visible-to-region
  • visible-to-country
  • visible-to-market
  • visible-to-store

After trying for a few days, this is the query that I have come up with. This query does work and returns the correct products for the given user, however it takes about 25 seconds to execute.

g.V().has('user','username', 'john.doe').union(
    __.out('can-view-region').out('contains-country').in('in-market').hasLabel('store'),
    __.out('can-view-country').in('in-market').hasLabel('store'),
    __.out('can-view-market').in('in-market').hasLabel('store'),
    __.out('can-view-store')
).dedup().union(
    __.out('in-market').in('contains-country').in('visible-to-region').hasLabel('product'),
    __.out('in-market').in('visible-to-country').hasLabel('product'),
    __.out('in-market').in('visible-to-market').hasLabel('product'),
    __.in('visible-to-store').hasLabel('product')
).dedup()

Is there a better way to do this? Is this problem maybe not best suited with a graph database?

Any help would be greatly appreciated!

Thanks, Chris

Christopher Haws
  • 1,715
  • 21
  • 21
  • What part of the traversal is taking up the most time? More specifically, how much of the 25 seconds is taken up by the part of the traversal before the first `dedup()`? – stephen mallette Oct 04 '17 at 10:32
  • @stephenmallette Before the first dedup is about 30ms, so that is super fast. The output of that dedup is about 600 vertices. All of the rest of the time is in the second union. The output of the second union+dedup is arount 5000 vertices. – Christopher Haws Oct 04 '17 at 16:31
  • 1. Is this being run against a partitioned graph collection? If yes, which property are you using as the partition key. 2. Are you using Microsoft.Azure.Graphs .net SDK (if yes, which version) or are you using CosmosDB Graph service via a gremlin client? 3. Can you try removing the union and split this into two traversals and test their respective duration? This would tell determine if there is a problem with the `union` step. – Oliver Towers Oct 04 '17 at 17:50
  • @OliverTowers 1) No, it is not partitioned. 2) Yes, I am using v0.3.0-preview of the Microsoft.Azure.Graphs SDK. 3) I ran the second union against a single store and it takes about 350ms to execute. It seems like it is not running the second union in parallel (not sure if it is supposed to). Its running the first union (30ms), taking the 600 results of the first union and for each one it's running the second union (350ms). So (30ms + (600 results * 350ms)) = 21 seconds. I can't think of a better way to write this query though. – Christopher Haws Oct 04 '17 at 18:31

2 Answers2

1

I don't think this is going to help a lot, but here's an improved version of your query:

g.V().has('user','username', 'john.doe').union(
    __.out('can-view-region').out('contains-country').in('in-market').hasLabel('store'),
    __.out('can-view-country','can-view-market').in('in-market').hasLabel('store'),
    __.out('can-view-store')
).dedup().union(
    __.out('in-market').union(
      __.in('contains-country').in('visible-to-region'),
      __.in('visible-to-country','visible-to-market')).hasLabel('product'),
    __.in('visible-to-store').hasLabel('product')
).dedup()

I wonder if the hasLabel() checks are really necessary. If, for example, .in('in-market') can only lead a store vertex, then remove the extra check.

Furthermore it might be worth to create shortcut edges. This would increase write times whenever you mutate the permissions, but should significantly increase the read times for the given query. Since the reads are likely to occur way more often than permission updates, this might be a good trade-off.

Daniel Kuppitz
  • 10,846
  • 1
  • 25
  • 34
  • Removing the hasLabel's speeds the query up by about 2 seconds. I thought that passing multiple params to out or in resulted in `or` logic, not chaining. Neat. Thanks for the tip. As for the shortcut edges, I have thought about doing that, but I don't know of a good way of keeping the up to date. Its kind of what we do right now actually. Right now we have a lucene index where we explode the relations to make querying super fast (which works great) but indexing is extremely slow (sometimes up to 5min if the product has a region association). It also cant scale anymore. – Christopher Haws Oct 04 '17 at 22:42
  • Some of the shortcut edges would be simple (like market is always in 1 country, so we could just add a country edge) but even with that, I fear that the second union will always be slow with the current setup. Even if I could make the second query take 10ms (which would be a huge speedup), 600 verticies * 10ms would still be 6 seconds, which is not acceptable for a website. Do you know if there is a reason that it cant run in parallel? – Christopher Haws Oct 04 '17 at 22:46
  • Parallelism is something providers need to implement on their own, it's not provided by TinkerPop. I don't know about Janus' plans, but parallel query execution is on the todo list for DSE Graph. Anyway, the only other solution I can think of would be a completely different model, that uses multi-valued properties to store the permissions on every product vertex. – Daniel Kuppitz Oct 05 '17 at 17:19
  • Oh, forgot that you're using CosmosDB, thought it was Janus. But same thing, I don't know anything about their plans. – Daniel Kuppitz Oct 05 '17 at 17:23
1

CosmosDB Graph team is looking into improvements that can done on union step in particular.

Other options that haven't already been suggested:

  1. Reduce the number of edges that are traversed per hop with additional predicates. e.g: g.V('1').outE('market').has('prop', 'value').inV()
  2. Would it be possible to split the traversal up and do parallel request in your client code? Since you are using .NET, you could take each result in first union, and execute parallel requests for the traversals in the second union. Something like this (untested code):

    string firstUnion = @"g.V().has('user','username', 'john.doe').union(
        __.out('can-view-region').out('contains-country').in('in-market').hasLabel('store'),
        __.out('can-view-country').in('in-market').hasLabel('store'),
        __.out('can-view-market').in('in-market').hasLabel('store'),
        __.out('can-view-store')
    ).dedup()"
    
    string[] secondUnionTraversals = new[] {
        "g.V({0}).out('in-market').in('contains-country').in('visible-to-region').hasLabel('product')",
        "g.V({0}).out('in-market').in('visible-to-country').hasLabel('product')",
        "g.V({0}).out('in-market').in('visible-to-market').hasLabel('product')",
        "g.V({0}).in('visible-to-store').hasLabel('product')",
    };
    
    var response = client.CreateGremlinQuery(col, firstUnion);
    while (response.HasMoreResults)
    {
        var results = await response.ExecuteNextAsync<Vertex>();
        foreach (Vertex v in results)
        {
            Parallel.ForEach(secondUnionTraversals, (traversal) =>
            {
                var secondResponse = client.CreateGremlinQuery<Vertex>(col, string.Format(traversal, v.Id));
                while (secondResponse.HasMoreResults)
                {
                    concurrentColl.Add(secondResponse);
                }
            });
        }
    }
    
Oliver Towers
  • 445
  • 2
  • 7
  • I can give this a shot. I'm a little worried about doing this on a web server though due to thread exhaustion. Is there maybe a way to tell the query that I only want the first 20 results back? I have tried using `.range` but it's not really like linq where the expression knows to yield return after x records. Im really at a loss as to how to improve the performance of this query, regardless of tech. I will give this a try and let you know my results. – Christopher Haws Oct 05 '17 at 22:19