6

I am looking to quickly insert multiple vertices using Azure Cosmos DB Graph-API. Most of the current Microsoft samples create the vertices one by one and execute a Gremlin query for each, like so:

IDocumentQuery<dynamic> query = client.CreateGremlinQuery<dynamic>(graph, "g.addV('person').property('id', 'thomas').property('name', 'Thomas').property('age', 44)");

while (query.HasMoreResults)
{                    
    foreach (dynamic result in await query.ExecuteNextAsync())  {   
        Console.WriteLine($"\t {JsonConvert.SerializeObject(result)}"); 
    }
    Console.WriteLine();
}


query = client.CreateGremlinQuery<dynamic>(graph, "g.addV('person').property('id', 'mary').property('name', 'Mary').property('lastName', 'Andersen').property('age', 39)");

while (query.HasMoreResults)
{                    
    foreach (dynamic result in await query.ExecuteNextAsync())  {   
        Console.WriteLine($"\t {JsonConvert.SerializeObject(result)}"); 
    }
    Console.WriteLine();
}

However this is less than ideal when I want to create a couple thousand vertices and edges to initially populate the graph as this can take some time.

This is with Microsoft.Azure.Graphs library v0.2.0-preview

How can I efficiently add multiple vertices at once to Cosmos DB so I may later query using the Graph API syntax?

kmcnamee
  • 5,097
  • 2
  • 25
  • 36

5 Answers5

6

I've found that the fastest way to seed your graph is actually to use the Document API. Utilizing this technique I've been able to insert 5500+ vertices/edges per second on a single development machine. The trick is to understand the format that Cosmos expects for both edges and vertices. Just add a couple vertices and edges to your graph through the gremlin API and then inspect the format of these documents by going to the Data Explorer in Azure and executing a document query to SELECT * FROM c.

At work I've built up a light ORM that uses reflection to take POCOs for edges and vertices and convert them to the format that you see in the portal. I'm hoping to open source this soon, at which point I'll most likely release a Nuget package and accompanying blog post. Hopefully in the meantime this will help point you in the right direction, let me know if you have more questions about this approach.

Jesse Carter
  • 20,062
  • 7
  • 64
  • 101
  • I'm trying to insert vertex and edges using only one CreateDocument call but the result is that my graph is discarding most of the values. Can you share what you've uploaded to create the objects? Thanks! – Murilo Maciel Curti Oct 28 '17 at 17:00
  • @MuriloMacielCurti I had the same issue, but the properties/values were in my case not lost, I got them when I used the document db api to retrieve them. But they are not visible in the azure graph ui. Instead of the properties in my case it shows the correct id and a label with the value 'NativeVertex'. Unfortunately I was not able to find more details on this. – Chief Wiggum Nov 08 '17 at 09:21
  • 1
    @JesseCarter can you share more on this? Thanks ChiefWiggum – Murilo Maciel Curti Nov 18 '17 at 11:42
  • Once you got the correct graphSON, how do you upload it as a graph then? I tried the migraiton tool without success. – François Feb 27 '18 at 07:36
  • 1
    @JesseCarter did you release the said ORM? – kDar Apr 23 '18 at 09:43
  • @Jesse Carter Have you written the blog post on the same ? – Jai Nov 22 '18 at 05:41
1

Assuming CosmosDB is 100% TinkerPop compliant and depending on the gremlin executor timeout setting, you should be able to update your gremlin script to do several operations at one time.

For example:

g.addV('person').property('id', 'mary').property('name', 'Mary').property('lastName', 'Andersen').property('age', 39)

can be transformed into:

g.addV('person').property('id', 'mary').property('name', 'Mary').property('lastName', 'Andersen').property('age', 39); g.addV('person').property('id', 'david').property('name', 'David').property('lastName', 'P').property('age', 24);

and etc etc.

Your gremlin script is also just Groovy code, so you could also even write loops and what not to be able to create vertices, append properties, etc.

David
  • 486
  • 2
  • 9
  • Thanks, I suspected that it should be possible to semi colon separate the multiple g.addV statements as you point out in your answer, however the CosmosDb Graph API appears to just run the first statement and not any subsequent ones. For example running the semi colon separate addV then running g.V().count() on an empty collection returns just 1. Maybe this is something specific to the CosmosDB Graph API implementation? – kmcnamee Jun 02 '17 at 15:33
  • I would be terribly surprised if that were true. Is this client code open source? You could take a look and read the code there to see how it handles the script you submit. But I really would imagine it just forwards the exact script to the GremlinServer. – David Jun 02 '17 at 15:55
  • Thanks for the suggestion. The source for the SDK doesn't look to be open sourced yet. I did however spin up Fiddler and look at what the CosmosDB Graph API sends over to the server. It doesn't look like it sends over the Gremlin script for the addV but it sends over the JSON for the object its going to insert. So it appears that the client library does some conversion. For a simple g.V().count() it sends over JSON with a query in the form {"query":"SELECT N_0 FROM Node N_0 WHERE (IS_DEFINED(N_0._isEdge) = false )"} – kmcnamee Jun 02 '17 at 17:34
1

We needed a tool to help us migrate data to cosmosdb graph but since nothing was available i ended up creating this - https://github.com/microsoft/migratetograph

You can use this to take data from some sql or json, transform it and push it to graph database. It supports parallel execution of gremlin queries, so it is considerably fast.
By default, it fires 10 gremlin queries parallelly, but you can increase it by passing batchSize in graph-config file

Abbas Cyclewala
  • 549
  • 3
  • 10
0

The Data Migration Tool may support SQL API or MongoDB scenarios, though it DOES NOT support graph api Vertex - Edges right out of the box at this stage. As mentioned earlier, you could probably use a generated graph query result as main reference pattern then do some Search and Replace... on your source to end up with proper format... though I found simply running a console application streaming data may be more adequate. I was able to reuse the same console app with Marvel as well as Airport flights scenarios and all I needed to do was modify a couple of lines of code each time. Code is run in 2 sequences. First block extracts and converts the Vertices. Second sequence extracts and converts fields relationships as Edges. All I needed to modify was the fields I need to extract. This may take a bit of time to convert depending on size of data though it gave me the exact expected results each time without having to constantly modify data at the source .

PeteZaria
  • 302
  • 3
  • 9
0

Im using this code to upsert multi Vertex by NodeJS

const __ = gremlin.process.statics;
let trt = await g.withBulk(true).V('test-3').fold().coalesce(__.unfold().property(gremlin.process.cardinality.single, 'runways', 4), __.addV('truongtest').property(gremlin.process.t.id, 'test-3').property(gremlin.process.cardinality.single, 'runways', 4))
        .V('test-10').fold().coalesce(__.unfold().property(gremlin.process.cardinality.single, 'runways', 100), __.addV('truongtest').property(gremlin.process.t.id, 'test-10').property(gremlin.process.cardinality.single, 'runways', 100))
        .next()
        
// if you wanna add alot , using loop 

let trt = await g.withBulk(true)
trt = trt.V('test-3').fold().coalesce(__.unfold().property(gremlin.process.cardinality.single, 'runways', 4), __.addV('truongtest').property(gremlin.process.t.id, 'test-3').property(gremlin.process.cardinality.single, 'runways', 4))
        
trt = trt.V('test-10').fold().coalesce(__.unfold().property(gremlin.process.cardinality.single, 'runways', 100), __.addV('truongtest').property(gremlin.process.t.id, 'test-10').property(gremlin.process.cardinality.single, 'runways', 100))

// after done run next()
trt.next()