4

I'm using Doctrine MongoDB ODM to fetch a small number of documents from a remote MongoDB database.

I confirmed the query took only 1ms to find about 12 matching docs. (i.e. 'millis':1 from explain ouput). But iterating through the results took around 250ms.

I couldn't get any performance gain when I tried combinations of the following options

  • select('name')
  • hydrate(false)
  • eagerCursor(true)
  • limit(1)

How can I minimize this delay?


UPDATE: More explanation with sample code

$qb = $dm->createQueryBuilder('Books');
$books = $qb->select('name')
            ->field('userId')->equals(123)
            ->field('status')->equals('active')
            ->eagerCursor(true)  // Fetch all data at once
            ->getQuery()
            ->execute();

/**
 * Due to using Eager Cursor, the database connection should be closed and
 * all data should be in memory now.
 */

// POINT A
foreach($books as $book) {
    // I do nothing here. Just looping through the results.
}
// POINT B.

/**
 * From POINT A to POINT B takes roughly 250ms when the query had 12 matching docs.
 * And this doesn't seem to be affected much by the number of records matched.
 * As the data is already in the memory, I expected this to be done in range of 
 * 5~10ms, not 250ms.
 *
 * Am I misunderstanding the meaning of Eager Cursor?
 */
Jaepil
  • 390
  • 5
  • 20
  • What is the query? Is there only one collection involved? – Thilo Jan 07 '13 at 06:47
  • Yes, only one collection. Basically I'm fetching records that matches {userId:123, status:'active'}. I already have an index for these fields. {userId:1, status:1} – Jaepil Jan 07 '13 at 06:50
  • What happens if you skip the ODM and just query and iterate via the Mongo console? –  Jan 07 '13 at 10:02
  • What driver version is this on? – Sammaye Jan 07 '13 at 10:47
  • @AmigableClarkKant, please check my updated question. Network latency shouldn't have any effects on the **foreach** section. – Jaepil Jan 07 '13 at 10:49
  • @IsaacCambron, thanks for the suggestion, but that won't explain why I'm getting delays in the **foreach** loop. Also I cannot accurately measure the elapsed time in Mongo console. If you have know how, please let me know. – Jaepil Jan 07 '13 at 10:54
  • 1
    Probably, when you querying mongodb it returns you a [mongo cursor](http://docs.mongodb.org/manual/core/read-operations/#read-operations-cursors) instead of real data, but when you iterating over the results mongo diver actually downloads all the data. I'm not familiar with PHP mongo driver, so I'm not sure. – Leonid Beschastny Jan 07 '13 at 10:57
  • @Sammaye, the driver version is 1.2.10. It's slightly outdated, I guess. – Jaepil Jan 07 '13 at 10:58
  • Can you look at your mongodb log to see if you are doing a `getMore` operation when you use the `foreach()`, I don't think it will but still good to check that it isn't doing a round trip for every iteration. Also expanding from Isaacs question, what if you take off doctrine and just do it with only the PHP driver? I have never known connection problems with 1.2 (I am on 1.2 and my connection has always been dirt fast) as such I am reluctant to call it a driver problem. – Sammaye Jan 07 '13 at 11:14
  • Ah sorry I see now, you were using eagercursor, I didn't read that, hmm...I would still ty taking out doctrine for the mion and see if you get fast querying then – Sammaye Jan 07 '13 at 13:30

2 Answers2

0

There are two intervals of interest here: one is from the start of your code to point A; the second is from point A to point B. What's being measured by .explain() is the former; what you're measuring is the latter.

In particular, once the BSON documents have been transferred to the client (your PHP program) they still need to be deserialized and converted into PHP objects. If you're using Doctrine, that has to perform additional processing as well.

The time you're seeing is the time for the deserialization process. How big are the documents? The entire content of the document must be deserialized: if these are large or complex (deeply nested, many arrays, etc.) then this can take some time.

You can reduce the deserialization time by only fetching the fields you need. If you add a ->select('_id') to your query, your loop time should be considerably faster.

William Z
  • 10,989
  • 4
  • 31
  • 25
  • As shown in the sample code, I already select only one field and only about 12 docs are returned. Deserializing these doc should be instant. – Jaepil Jan 08 '13 at 22:18
0

OK, looks like I misunderstood the meaning of Eager Cursor.

http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/eager-cursors.html This document suggests that when execute() is called all results are retrieved into the memory. But that's not quite accurate. Instead, I found the records were fetched when the EasgerCursor was accessed for the first time.

In EagerCursor's initialize() function, the instance of a normal Mongo Cursor is retrieved and eventually passed into iterator_to_array() function after going through few other functions. I'm pretty sure this is when Mongo Driver does its real work of fetching records.

In my case, this happens in the foreach loop. Hence the delay was observed between Point A and Point B. Just to confirm this, I tried MongoClient implementation and found the overall times were very close between the implementations.

Thanks everyone for the help.

Jaepil
  • 390
  • 5
  • 20