19

I was doing some benchmarking, so I had a SQL database with 2500 records. I inserted those records into DocumentDB.

I wrote two lines of code, one with entity framework to pull all 2500 into an array in C#. The next line to pull all 2500 into an array from DocuementDB.

Code used:

var test= await Task<Test>.Run(() =>
              client.CreateDocumentQuery<Test>(collection.DocumentsLink)
              .ToList());

The DocumentDB example took over 20 seconds. The SQL Server line was near instant. The objects are simple DTO with 5 properties, and I did the SQL Query over the internet.

Am I misusing DocumentDB? I thought it was made to pull all your records into memory then join with linq.

bladefist
  • 1,084
  • 3
  • 15
  • 25
  • Just tried the same thing w/ Azure Table Storage - Near instant results. – bladefist Sep 02 '14 at 03:05
  • 2
    Find out where the time is being spent. Profile the process. Might be network roundtrips. Use Fiddler to see how many requests are being issued. – usr Sep 02 '14 at 11:13
  • 1
    Note it's not really applicable to compare a RDBMS to nonrelational. They're meant for storing different kinds of data models. If you want a more accurate comparison you need a rich object graph the kind that you would use EntityFramework for and a single .NET object will take 3-10 tables to store (multiple joins, subselects etc). You want to eagerly load the entire object with EF. Those exact same objects can be stored directly in DocumentDB. Then you want to compare the performance of `Foos.ToList()` – Chris Marisic Sep 05 '14 at 19:30

1 Answers1

15

@bladefist, you should be able to achieve much better performance with DocumentDB. For example, take a look at this code stub and output from an Azure VM and DocumentDB account both in West Europe.

Stopwatch watch = new Stopwatch();
for (int i = 0; i < 10; i++)
{
    watch.Start();
    int numDocumentsRead = 0;
    foreach (Document d in client.CreateDocumentQuery(collection.SelfLink, 
        new FeedOptions { MaxItemCount = 1000 }))
    {
        numDocumentsRead++;
    }

    Console.WriteLine("Run {0} - read {1} documents in {2} ms", i, numDocumentsRead, 
        watch.Elapsed.TotalMilliseconds);
    watch.Reset();
}

//Output
Run 0 - read 2500 documents in 426.1359 ms
Run 1 - read 2500 documents in 286.506 ms
Run 2 - read 2500 documents in 227.4451 ms
Run 3 - read 2500 documents in 270.4497 ms
Run 4 - read 2500 documents in 275.7205 ms
Run 5 - read 2500 documents in 281.571 ms
Run 6 - read 2500 documents in 268.9624 ms
Run 7 - read 2500 documents in 275.1513 ms
Run 8 - read 2500 documents in 301.0263 ms
Run 9 - read 2500 documents in 288.1455 ms

Some best practices to follow for performance:

  • Use direct connectivity and TCP protocol
  • Use a large page size (max: 1000) if you’re reading in large batches to minimize the number of round trips
  • To reduce latency, run your client in the same region as your DocumentDB account
  • The provisioned throughput (and storage) of the capacity units that you purchase is spread across the collections. So if you want to measure the throughput you should make sure that your app distributes the workload across all the collections. For instance, if you have purchase 1 CU, you can choose to distribute the all of the throughput to a single collection or across three collections.
Aravind Krishna R.
  • 7,885
  • 27
  • 37
  • 4
    Thanks. I ran your code from home and I was getting ~15 second response times. Copied the code to an Azure VM, but it's not in the same data center as the documentDB service. That ran at about ~5 seconds a response. That is pretty telling, you basically need to put everything in the same data center. I don't understand still though, as Azure Table storage, from home, is blazing fast. – bladefist Sep 07 '14 at 23:30
  • @bladefist Azure Table storage and Azure SQL services are General Available so they(microsoft) are already optimized to use across all data centres but DocumentDB is in preview so it is not optimized yet – Chandan Pasunoori Jan 08 '15 at 03:35
  • 1
    I'm having this exact same problem, only these suggestions don't fix it. I'm loading 5500 documents and it's taking around 30 seconds. Just like in the original question, loading data from Sql Azure or Table Storage is blazing fast. – BowserKingKoopa Apr 14 '16 at 00:54
  • @BrowserKingKoopa Am facing a similar problem -> takes about 19 seconds to get about 183 based on name match. What did you do to improve the performance? – Jaya Apr 25 '16 at 18:48
  • @JS_GodBlessAll Things have gotten much worse. I updated to the latest DocumentDB .net sdk and the same query has gone from 30 seconds to 3.5 MINUTES! So I'm a bit at a loss for what to do. – BowserKingKoopa May 15 '16 at 00:52
  • While not apples to apples, I added 5000 documents using the storage emulator on my local PC and it took {00:00:00.5094307} to get the data back out. – Carol AndorMarten Liebster Dec 08 '16 at 21:58