4

I'm doing some tests to see what kind of throughput I can get from Mongodb. The documentation says that capped collections are the fastest option. But I often find that I can write to a normal collection much faster. Depending on the exact test, I can often get twice the throughput with a normal collection.

Am I missing something? How do I troubleshoot this?

I have a very simple C++ program that writes about 64,000 documents to a collection as fast as possible. I record the total time, and the time that I'm waiting for the database. If I change nothing but the collection name, I can see a clear difference between the capped and normal collections.

> use tutorial
switched to db tutorial
> db.system.namespaces.find()
{ "name" : "tutorial.system.indexes" }
{ "name" : "tutorial.persons.$_id_" }
{ "name" : "tutorial.persons" }
{ "name" : "tutorial.persons.$age_1" }
{ "name" : "tutorial.alerts.$_id_" }
{ "name" : "tutorial.alerts" }
{ "name" : "tutorial.capped.$_id_" }
{ "name" : "tutorial.capped", "options" : { "create" : "capped", "capped" : true, "size" : 100000000 } }
> db.alerts.stats()
{
    "ns" : "tutorial.alerts",
    "count" : 400000,
    "size" : 561088000,
    "avgObjSize" : 1402.72,
    "storageSize" : 629612544,
    "numExtents" : 16,
    "nindexes" : 1,
    "lastExtentSize" : 168730624,
    "paddingFactor" : 1,
    "systemFlags" : 1,
    "userFlags" : 0,
    "totalIndexSize" : 12991664,
    "indexSizes" : {
        "_id_" : 12991664
    },
    "ok" : 1
}
> db.capped.stats()
{
    "ns" : "tutorial.capped",
    "count" : 62815,
    "size" : 98996440,
    "avgObjSize" : 1576,
    "storageSize" : 100003840,
    "numExtents" : 1,
    "nindexes" : 1,
    "lastExtentSize" : 100003840,
    "paddingFactor" : 1,
    "systemFlags" : 1,
    "userFlags" : 0,
    "totalIndexSize" : 2044000,
    "indexSizes" : {
        "_id_" : 2044000
    },
    "capped" : true,
    "max" : 2147483647,
    "ok" : 1
}

linux version: 3.4.11-1.fc16.x86_64

mongo version: db version v2.2.2, pdfile version 4.5

This is a dedicated machine doing nothing but running the Mongodb server and my test client. The machine is ridiculously overpowered for this test.

Trade-Ideas Philip
  • 1,067
  • 12
  • 21
  • Can you provide a link to the documentation you've seen that indicates inserting to capped collections is faster? – JohnnyHK Jan 18 '13 at 04:53
  • http://docs.mongodb.org/manual/core/capped-collections/ "Inserting documents in a capped collection without an index is close to the speed of writing log information directly to a file system." – Trade-Ideas Philip Jan 18 '13 at 06:53

2 Answers2

3

I see the problem. The web page I quoted above says that a capped collection "without an index" will offer high performance. But…

http://docs.mongodb.org/manual/core/indexes/ says "Before version 2.2 capped collections did not have an _id field. In 2.2, all capped collections have an _id field, except those in the local database."

I created another version of my test which writes to a capped collection in the local database. Sure enough, this collection did not have any indexes, and my throughput was much higher!

Perhaps the overview of capped collections at http://docs.mongodb.org/manual/core/capped-collections/ should clarify this point.

Trade-Ideas Philip
  • 1,067
  • 12
  • 21
  • Also, note that you should compare *inserting + removing old data*, not just *inserting* - this is what capped colleciton does automatically. – johndodo Jan 27 '13 at 21:31
  • 1
    This is exactly where we're at as well. We're finding capped collections are killing our performance. We thought it was the index and were properly outraged by https://jira.mongodb.org/browse/SERVER-2048 but then noticed that our uncapped collections had an id index as well. But then we realized that we weren't removing old data from the uncapped collections. And the rage returned. – ayang Oct 10 '13 at 14:47
1

Capped collections guarantee preservation of the insertion order. As a result, queries do not need an index to return documents in insertion order. Without this indexing overhead, they can support higher insertion throughput.

According to the above definition, if you don't have any index insertion to capped collections does not have to be faster than inserting to a normal collection. So if you don't have any indexes and if you don't have any other reason to go with capped collection such as caching, showing last n elements kind a stuff I would suggest you to go with regular collections.

Capped collections guarantee that insertion order is identical to the order on disk (natural order) and do so by prohibiting updates that increase document size. Capped collections only allow updates that fit the original document size, which ensures a document does not change its location on disk.

cubbuk
  • 7,800
  • 4
  • 35
  • 62
  • If you don't have an index, insertion will be faster as you don't have to update that index. Maintaining insertion order does not add additional overhead. You simply can't delete the document in a capped collection or increase its size, so the document will always stay on the same position on the disk. That is how the order is maintained. – wombatonfire Mar 09 '16 at 17:24