I am creating a PoC using Azure Cosmos DB for Apache Gremlin to model and query/display the Organization Chart. Given a node, the service needs to traverse the graph and pull all children recursively so the UI can display the Organization Chart. For each traversed node, I need to get few attributes e.g. Name, ImageUrl, Direct Manager etc. I was able to put together a query that gives me the desired data but it seems to be very inefficient. It works but as I go up the hierarchy, it starts to timeout. Currently, I don't even have a very large dataset, just about a sample organization with ~500 employees and about 5 levels in handful of paths. The UI component requires the data to be flat i.e. id, name, ..., parentId
.
Here is how the model looks like
Here is the query that is being used
g.V('rootUserId')
.emit()
.repeat(out('manages'))
.until(__.not(outE('manages')))
.path()
.by(project('orgUser', 'reportsTo')
.by(valueMap(true))
.by(out('reportsTo')
.valueMap(true)
.dedup()
.fold()))
.unfold()
.dedup()
How can this be optimized? Needless to say, I don't expect this to execute in few seconds. Since this data is slow changing, the queried data would be cached but I need to be able to query it given a root user which could be at level 2 or level 10 (based on UI selection). CosmosDB has a hard limit on execution time (30 seconds as per documentation but I am observing 60 seconds). Eventually, I would like this to scale to a few thousands employees organization. If this model and/or querying method is not the right approach, what is the recommended approach?