0

I am writing a C++ client that processes a file with 1M rows and inserts them in mongodb. The data is inserted in 6 different tables that follow different accumulation criteria. I accumulate the data of 5000 rows (configurable) in a C++ map and then insert (upsert) into mongodb, rinse and repeat until the end of file.

The code for one of the upserts is:

void Upsert(mongo::DBClientConnection & cmon)
    {
        map<KEY, VALUE>::const_iterator it;
        cmon.ensureIndex("test.WebNavigationUri", BSON("ClientId" << 1 << "Uri" << 1 << "Date" << 1), true);

        for (it=detail.begin(); it!=detail.end(); ++it)
        {
            cmon.update("test.WebNavigationUri",
                BSON("ClientId" << it->first.clientid << "Uri" << it->first.uri << "Date" << it->second.dHora),
                BSON("$inc" << BSON("Time" << it->second.time)),
                true);
        }
    }

The other upsert are identical except for the info stored. My problem is that if I only use 2 tables, the times are reasonably fast (4-8 times faster than MySql), but if I use all 6 the performance degrades.

With 2 tables:

WebAccumulationCategory: 6877ms 261749 items; 1465ms (1465 + 0 + 0 + 0).

WebAccumulationClientId: 5005ms 0 items; 0ms (0 + 0 + 0 + 0).

WebAccumulationPolicy: 5115ms 0 items; 0ms (0 + 0 + 0 + 0).

WebAccumulationVirusName: 0ms 0 items; 0ms (0 + 0 + 0 + 0).

WebNavigationCategory: 5442ms 0 items; 0ms (0 + 0 + 0 + 0).

WebNavigationUri: 9493ms 414610 items; 3666ms (3666 + 0 + 0 + 0).

With 6 tables (same file, tables dropped):

WebAccumulationCategory: 6605ms 261749 items; 1421ms (1421 + 0 + 0 + 0)

WebAccumulationClientId: 5367ms 172873 items; 1753ms (1753 + 0 + 0 + 0)

WebAccumulationPolicy: 6187ms 174941 items; 2086ms (2086 + 0 + 0 + 0)

WebAccumulationVirusName: 12ms 52 items; 0ms (0 + 0 + 0 + 0)

WebNavigationCategory: 18745ms 258072 items; 14510ms (14510 + 0 + 0 + 0)

WebNavigationUri: 33046ms 414610 items; 27317ms (27317 + 0 + 0 + 0)

Times are measured in the program with

boost::posix_time::microsec_clock::local_time();

I use mongodb 2.4.9 in an Oracle Server 6.5 machine. The program uses the C++ driver and runs in the same machine.

As you can see WebNavigationUri goes from 3666ms to 27317ms when I upsert in the other 4 tables, but the time for WebAccumulationCategory remains the same. It would seem there is some sort of interaction between the tables and/or the order in which I make the upserts. Any ideas why is this loss of performance is happening? What can I look for / monitor in the database?

mongostat says:

insert  query update delete getmore command flushes mapped  vsize    res faults  locked db idx miss %     qr|qw   ar|aw  netIn netOut  conn       time  
    *0     *0     *0     *0       0     1|0       0   544m  1.42g   247m      0  test:0.0%          0       0|0     0|0    62b     3k     6   12:09:59  
     3     *0   2696     *0       0     1|0       0   544m  1.42g   254m      4 test:23.5%          0       0|0     0|0   583k     3k     6   12:10:00  
     3     *0  10635     *0       0     2|0       0   544m  1.42g   248m      3 test:80.3%          0       0|0     0|1     2m     3k     6   12:10:01  
    *0     *0  13151     *0       0     1|0       0   544m  1.42g   252m      0 test:80.9%          0       0|0     0|1     2m     3k     6   12:10:02  
    *0     *0  13089     *0       0     1|0       0   544m  1.42g   253m      0 test:81.7%          0       0|0     0|1     2m     3k     6   12:10:03  
    *0     *0  13567     *0       0     1|0       0   544m  1.42g   250m      0 test:81.5%          0       0|0     0|1     2m     3k     6   12:10:04  
    *0     *0  13636     *0       0     1|0       0   544m  1.42g   248m      0 test:81.2%          0       0|0     0|1     2m     3k     6   12:10:05  
    *0     *0  13417     *0       0     2|0       0   544m  1.42g   255m      0 test:78.9%          0       0|0     0|1     2m     3k     6   12:10:06  
    *0     *0  13759     *0       0     1|0       0   544m  1.42g   256m      0 test:81.0%          0       0|0     0|1     2m     3k     6   12:10:07  
    *0     *0  13556     *0       0     1|0       0   544m  1.42g   257m      0 test:80.9%          0       0|0     0|1     2m     3k     6   12:10:08  
insert  query update delete getmore command flushes mapped  vsize    res faults  locked db idx miss %     qr|qw   ar|aw  netIn netOut  conn       time  
    *0     *0  13821     *0       0     1|0       0   544m  1.42g   257m      0 test:81.0%          0       0|0     0|1     2m     3k     6   12:10:09  
    *0     *0  13932     *0       0     1|0       0   544m  1.42g   258m      0 test:81.0%          0       0|0     0|1     2m     3k     6   12:10:10  
    *0     *0  13460     *0       0     2|0       0   544m  1.42g   258m      0 test:81.0%          0       0|0     0|1     2m     3k     6   12:10:11  
    *0     *0  13823     *0       0     1|0       0   544m  1.42g   256m      0 test:80.8%          0       0|0     0|1     2m     3k     6   12:10:12  
    *0     *0  13811     *0       0     1|0       0   544m  1.42g   253m      0 test:80.3%          0       0|0     0|1     2m     3k     6   12:10:13  
    *0     *0  13794     *0       0     1|0       0   544m  1.42g   251m      0 test:81.2%          0       0|0     0|1     2m     3k     6   12:10:14  
    *0     *0  13605     *0       0     1|0       0   544m  1.42g   247m      0 test:79.6%          0       0|1     0|1     2m     3k     6   12:10:15  
    *0     *0  13403     *0       0     2|0       0   544m  1.42g   261m      0 test:79.0%          0       0|0     0|1     2m     3k     6   12:10:16  
    *0     *0  13781     *0       0     1|0       0   544m  1.42g   261m      0 test:81.2%          0       0|0     0|0     2m     3k     6   12:10:17  
    *0     *0  13824     *0       0     1|0       0   544m  1.42g   262m      0 test:80.6%          0       0|0     0|1     2m     3k     6   12:10:18  
insert  query update delete getmore command flushes mapped  vsize    res faults  locked db idx miss %     qr|qw   ar|aw  netIn netOut  conn       time  
    *0     *0  13954     *0       0     1|0       0   544m  1.42g   262m      0 test:81.0%          0       0|0     0|1     2m     3k     6   12:10:19  
    *0     *0  13873     *0       0     2|0       0   544m  1.42g   262m      0 test:81.3%          0       0|0     0|1     2m     3k     6   12:10:20  
    *0     *0  13416     *0       0     2|0       0   544m  1.42g   261m      0 test:79.4%          0       0|0     0|1     2m     3k     6   12:10:21  
    *0     *0  13840     *0       0     1|0       0   544m  1.42g   261m      0 test:81.2%          0       0|0     0|1     2m     3k     6   12:10:22  
    *0      1  13657     *0       0    13|0       0   544m  1.42g   259m      0 test:80.4%          0       0|0     0|1     2m    11k     6   12:10:23  
    *0     *0  12566     *0       0     1|0       0   544m  1.42g   259m      0 test:75.5%          0       0|0     0|1     2m     3k     6   12:10:24  
    *0     *0  13777     *0       0     1|0       0   544m  1.42g   251m      0 test:80.4%          0       0|0     0|1     2m     3k     6   12:10:25  
    *0     *0  13148     *0       0     2|0       0   544m  1.42g   265m      0 test:77.6%          0       0|0     0|1     2m     3k     6   12:10:26  
    *0     *0  13691     *0       0     1|0       0   544m  1.42g   265m      0 test:80.7%          0       0|0     0|1     2m     3k     6   12:10:27  
    *0     *0  13633     *0       0     1|0       0   544m  1.42g   265m      0 test:80.2%          0       0|0     0|0     2m     3k     6   12:10:28  
insert  query update delete getmore command flushes mapped  vsize    res faults  locked db idx miss %     qr|qw   ar|aw  netIn netOut  conn       time  
    *0     *0  13538     *0       0     1|0       0   544m  1.42g   265m      0 test:80.7%          0       0|0     0|1     2m     3k     6   12:10:29  
    *0     *0  13773     *0       0     1|0       0   544m  1.42g   265m      0 test:80.4%          0       0|0     0|1     2m     3k     6   12:10:30  
    *0     *0   9118     *0       0     2|0       0   544m  1.42g   263m      0 test:52.9%          0       0|0     0|1     1m     3k     6   12:10:31  
    *0     *0  11894     *0       0     1|0       0   544m  1.42g   268m      0 test:68.6%          0       0|0     0|1     2m     3k     6   12:10:32  
    *0     *0  13742     *0       0     1|0       0   544m  1.42g   266m      0 test:80.6%          0       0|0     0|1     2m     3k     6   12:10:33  
    *0     *0  13583     *0       0     1|0       0   544m  1.42g   266m      0 test:80.8%          0       0|0     0|1     2m     3k     6   12:10:34  
    *0     *0  13769     *0       0     1|0       0   544m  1.42g   266m      0 test:81.4%          0       0|0     0|1     2m     3k     6   12:10:35  
    *0     *0  13921     *0       0     2|0       0   544m  1.42g   268m      0 test:81.1%          0       0|0     0|1     2m     3k     6   12:10:36  
    *0     *0  13414     *0       0     1|0       0   544m  1.42g   265m      0 test:78.4%          0       0|0     0|1     2m     3k     6   12:10:37  
    *0     *0  13489     *0       0     1|0       0   544m  1.42g   263m      0 test:80.7%          0       0|0     0|1     2m     3k     6   12:10:38  
insert  query update delete getmore command flushes mapped  vsize    res faults  locked db idx miss %     qr|qw   ar|aw  netIn netOut  conn       time  
    *0     *0  13847     *0       0     1|0       0   544m  1.42g   262m      0 test:81.3%          0       0|0     0|1     2m     3k     6   12:10:39  
    *0      3  13776     *0       0     1|0       0   544m  1.42g   266m      0 test:80.6%          0       0|0     0|1     2m     3k     6   12:10:40  
    *0     *0  14000     *0       0     2|0       0   544m  1.42g   251m      0 test:81.1%          0       0|0     0|1     2m     3k     6   12:10:41  
    *0     *0  13416     *0       0     1|0       0   544m  1.42g   270m      0 test:77.8%          0       0|0     0|1     2m     3k     6   12:10:42  
    *0     *0  13583     *0       0     1|0       0   544m  1.42g   270m      0 test:80.8%          0       0|0     0|1     2m     3k     6   12:10:43  
    *0     *0  13854     *0       0     1|0       0   544m  1.42g   272m      0 test:80.9%          0       0|0     0|1     2m     3k     6   12:10:44  
    *0     *0  13742     *0       0     1|0       0   544m  1.42g   270m      0 test:80.6%          0       0|0     0|1     2m     3k     6   12:10:45  
    *0     *0  13708     *0       0     2|0       0   544m  1.42g   269m      0 test:80.7%          0       0|0     0|1     2m     3k     6   12:10:46  
    *0     *0  13675     *0       0     1|0       0   544m  1.42g   269m      0 test:80.8%          0       0|0     0|1     2m     3k     6   12:10:47  
    *0     *0  11748     *0       0     1|0       0   544m  1.42g   271m      0 test:68.9%          0       0|0     0|1     2m     3k     6   12:10:48  
insert  query update delete getmore command flushes mapped  vsize    res faults  locked db idx miss %     qr|qw   ar|aw  netIn netOut  conn       time  
    *0     *0   9845     *0       0     1|0       1   544m  1.42g   274m      0 test:56.4%          0       0|0     0|1     1m     3k     6   12:10:49  
    *0     *0  13623     *0       0     2|0       0   544m  1.42g   271m      0 test:81.0%          0       0|0     0|1     2m     3k     6   12:10:50  
    *0     *0  13530     *0       0     2|0       0   544m  1.42g   271m      0 test:80.4%          0       0|0     0|1     2m     3k     6   12:10:51  
    *0     *0  14127     *0       0     1|0       0   544m  1.42g   274m      0 test:81.2%          0       0|0     0|1     2m     3k     6   12:10:52  
    *0     *0  13734     *0       0     1|0       0   544m  1.42g   270m      0 test:80.8%          0       0|0     0|1     2m     3k     6   12:10:53  
    *0     *0  13305     *0       0     1|0       0   544m  1.42g   268m      0 test:81.3%          0       0|0     0|1     2m     3k     6   12:10:54  
    *0     *0  13575     *0       0     1|0       0   544m  1.42g   269m      0 test:80.9%          0       0|0     0|1     2m     3k     6   12:10:55  
    *0     *0  13735     *0       0     2|0       0   544m  1.42g   272m      0 test:80.2%          0       0|0     0|1     2m     3k     6   12:10:56  
    *0     *0  13809     *0       0     1|0       0   544m  1.42g   257m      0 test:80.5%          0       0|0     0|1     2m     3k     6   12:10:57  
    *0     *0  13896     *0       0     1|0       0   544m  1.42g   252m      0 test:81.3%          0       0|0     0|1     2m     3k     6   12:10:58  
insert  query update delete getmore command flushes mapped  vsize    res faults  locked db idx miss %     qr|qw   ar|aw  netIn netOut  conn       time  
    *0     *0  13207     *0       0     1|0       0   544m  1.42g   274m      0 test:78.0%          0       0|0     0|0     2m     3k     6   12:10:59  
    *0     *0  14075     *0       0     1|0       0   544m  1.42g   276m      0 test:81.1%          0       0|0     0|1     2m     3k     6   12:11:00  
    *0     *0  11833     *0       0     2|0       0   544m  1.42g   273m      0 test:70.9%          0       0|0     0|1     2m     3k     6   12:11:01  
    *0     *0  13540     *0       0     1|0       0   544m  1.42g   271m      0 test:81.0%          0       0|0     0|1     2m     3k     6   12:11:02  
    *0     *0  13762     *0       0     1|0       0   544m  1.42g   272m      0 test:80.8%          0       0|0     0|1     2m     3k     6   12:11:03  
    *0     *0  13535     *0       0     1|0       0   544m  1.42g   275m      0 test:79.3%          0       0|0     0|1     2m     3k     6   12:11:04  
    *0     *0  13584     *0       0     1|0       0   544m  1.42g   265m      0 test:80.1%          0       0|0     0|1     2m     3k     6   12:11:05  
    *0     *0   9272     *0       0     2|0       0   544m  1.42g   275m      0 test:54.3%          0       0|0     0|1     1m     3k     6   12:11:06  
    *0     *0   8524     *0       0     1|0       0   544m  1.42g   271m      0 test:52.8%          0       0|0     0|1     1m     3k     6   12:11:07  
    *0     *0  13772     *0       0     1|0       0   544m  1.42g   272m      0 test:81.6%          0       0|0     0|1     2m     3k     6   12:11:08  
insert  query update delete getmore command flushes mapped  vsize    res faults  locked db idx miss %     qr|qw   ar|aw  netIn netOut  conn       time  
    *0     *0  11479     *0       0     1|0       0   544m  1.42g   271m      0 test:65.8%          0       1|0     0|1     2m     3k     6   12:11:09  
    *0     *0  13835     *0       0     1|0       0   544m  1.42g   252m      0 test:80.8%          0       0|0     0|1     2m     3k     6   12:11:10  
    *0     *0  13597     *0       0     2|0       0   544m  1.42g   276m      0 test:80.2%          0       0|0     0|1     2m     3k     6   12:11:11  
    *0     *0  13704     *0       0     1|0       0   544m  1.42g   275m      0 test:80.9%          0       0|0     0|1     2m     3k     6   12:11:12  
    *0     *0  14127     *0       0     1|0       0   544m  1.42g   279m      0 test:81.2%          0       0|0     0|1     2m     3k     6   12:11:13  
    *0     *0  13144     *0       0     1|0       0   544m  1.42g   275m      0 test:78.1%          0       0|0     0|1     2m     3k     6   12:11:14  
    *0     *0  13680     *0       0     1|0       0   544m  1.42g   273m      0 test:80.9%          0       0|0     0|1     2m     3k     6   12:11:15  
    *0     *0  13491     *0       0     2|0       0   544m  1.42g   273m      0 test:78.9%          0       0|0     0|0     2m     3k     6   12:11:16  
    *0     *0     *0     *0       0     1|0       0   544m  1.42g   265m      0     .:0.1%          0       0|0     0|0    62b     3k     5   12:11:17  
    *0      1     *0     *0       0    13|0       0   544m  1.42g   258m      0 config:0.1%          0       0|0     0|0   892b    11k     5   12:11:18 

Blocking is at 80% but I guess it's normal given that I am upserting 13K records per second.

smp
  • 61
  • 4
  • Nice stats, but something just flew *way* over my head. Not sure what that was. What exactly are you asking? And how are **we** supposed to analyse this? Think about that. This question needs editing. – Neil Lunn Mar 12 '14 at 14:41

2 Answers2

1

Blockquote Any ideas why is this loss of performance is happening? What can I look for / monitor in the database?

You might want to run mongostat (http://docs.mongodb.org/manual/reference/program/mongostat/ ):

 mongostat --username <user> --password <pass>

and pay attention to the number of faults, if not zero your working set does not fit in memory hence data access becomes very slow. Solution would be to work with your index, provide more RAM and/or start sharding your database

mcorbe
  • 165
  • 1
  • 8
  • Thanks. I have updated the question with the mongostat info. I was hoping to see lots of write queueing but no. – smp Mar 17 '14 at 10:35
  • It looks like here you have a lot of locks > 80%, that means that you read / write access are delayed until locks are gone (http://docs.mongodb.org/manual/faq/concurrency/). Updates are lock intensive and if you attempt to acces multiple time the same document you might spend most of your time waiting for the lock to be released. – mcorbe Mar 17 '14 at 15:55
  • A solution could be to work on your document format and make sure that you would need only one update query per document in mongo. – mcorbe Mar 17 '14 at 15:57
1

Use different databases. The current versions of MongoDB have database level locking so instead of using test for all of them just use test1, test2, testN...

You can also try using the bulk api in the legacy driver (slated for the next release) to avoid the network round trips.

Tyler Brock
  • 29,626
  • 15
  • 79
  • 79