3

So, I've been trying out the php framework lithium now and it seems like a really good framework and all but I have a slight problem. A query I run on a collection with only 6k+ documents is amazingly slow from php but blazingly fast when I run it from the terminal.

One document in the collection may look like this:

{
    "_id" : ObjectId("504c9a3b6070d8b7ea61938e"),
    "startDate" : "Jan 2011",
    "episodes" : [
        {
            "title" : "Series 1, Episode 1",
            "airdate" : ISODate("2011-01-20T00:00:00Z"),
            "epnum" : "1",
            "prodnum" : null,
            "seasonnum" : "01",
            "link" : "http://www.tvrage.com/10_OClock_Live/episodes/1065007783"
        },
        {and maybe 20 more},
    ],
    "runTime" : "60 min",
    "endDate" : "Apr 2012",
    "network" : "Channel 4",
    "numberOfEpisodes" : "25 eps",
    "title" : "10 O'Clock Live",
    "directory" : "10OClockLive",
    "country" : "UK",
    "tvrage" : "27363"
}

I want to get all episodes that exists for this current month. So in the terminal (I use fake values and more than a month) I use the following query:

db.series.find({'episodes.airdate': {$gt: ISODate('2012-09-07 00:00:00'), $lt: ISODate('2012-11-01')}})

And wham, it just goes very fast. Even if I do an explain() on the query it tells me that it's fast:

{
    "cursor" : "BtreeCursor episodes.airdate_1",
    "isMultiKey" : true,
    "n" : 382,
    "nscannedObjects" : 1620,
    "nscanned" : 1620,
    "nscannedObjectsAllPlans" : 1620,
    "nscannedAllPlans" : 1620,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 0,
    "nChunkSkips" : 0,
    **"millis" : 181**,
    "indexBounds" : {
        "episodes.airdate" : [
            [
                ISODate("2012-09-07T00:00:00Z"),
                ISODate("292278995-01--2147483647T07:12:56.808Z")
            ]
        ]
    },
    "server" : "example:27017"
}

But when I use the query inside php and lithium, man, it take ages:

$series = Series::find('all', array(
                'fields' => array('title', 'episodes.title', 'episodes.airdate'),
                'conditions' => array('episodes.airdate' => array('$gt' => new MongoDate(strtotime(date('Y-m-01'))), '$lt' =>  new MongoDate(strtotime(date('Y-m-t')))))
            ));

And if I even try to loop through it, then it's even worse well past the 30 second execution time. All though, I think I have a memory leak since I had to add this ini_set('memory_limit', '-1'); without getting a "maxium usage" or whatever.

Could anyone provide me with a answer on why this is happening? Is there any way to improve the speed of the query? I have no idea why it is so slow and I would be super glad if anyone could point me in the right direction.

Ms01
  • 4,420
  • 9
  • 48
  • 80
  • one thing you can do is look in the mongo log - it will print the query that's being run and how long it took, might be worthwhile to see what it's printing when the PHP query is passed back. – Asya Kamsky Oct 07 '12 at 21:32
  • Woah, just found a surprising thing. In the fields, if I skip anything that's inside `episodes` it goes fast as hell. But if I try to get anything inside `episodes` it takes forever. Waiting for a query to run to see how long it will take. – Ms01 Oct 07 '12 at 21:36
  • I think this is from a php-query: `Sun Oct 7 23:21:27 [conn17] query thismonth.series query: { query: { episodes.airdate: { $gt: new Date(1346976000000), $lt: new Date(1351728000000) } }, $explain: true } ntoreturn:0 ntoskip:0 nscanned:1620 keyUpdates:0 locks(micros) r:182062 nreturned:1 reslen:646 182ms` but it also is some interesting things after that: http://pastie.org/private/2lsdskg10fjjkkwopfmq4q – Ms01 Oct 07 '12 at 21:38
  • FYI, when you loop a mongo db result in lithium in a foreach loop, it caches the documents in the DocumentSet object. So that's why you run out of memory when dealing with 6k documents.. – rmarscher Oct 08 '12 at 00:23

3 Answers3

5

The issue is that Lithium boxes all the data in objects, which for large queries can be very memory-intensive, hence slow. If you don't need any ActiveRecord features for that particular query, there's an option you can pass to find() which gets passed to MongoDb::read() (so check the docs for MongoDb::read()) that allows you to get back either a raw array, or the actual database cursor which you can iterate over manually.

The other option is to wait till I implement streaming iteration, which will solve the memory problem. :-)

Nate Abele
  • 5,771
  • 32
  • 27
  • The `MongoDb::read()` option that returns the raw array is `calculate`. Hence `Series::all(array('calculate' => true, 'conditions' => $conditions));` – michaeltwofish Oct 12 '12 at 01:14
  • 1
    Correction, `'calculate' => true` gives you a MongoCursor. I don't believe it's possible to return a raw array, but you can convert a cursor to an array with `iterator_to_array($cursor)`. – michaeltwofish Oct 23 '12 at 06:14
1

I'm not sure why this is slow for you. I have a gist here with a class that will log insert, read, and update mongo commands issued from lithium. You could probably add some type of timer to that to get the length of each query. Then you could at least know if the problem is waiting for mongo or other parts of the code.

Here is some code for iterating over a DocumentSet while discarding each document retrieved from the MongoCursor as you loop.

$docs = SomeModel::all();
while ($docs->valid()) {
    $key = $docs->key();
    $doc = $docs->current();
    unset($docs[$key]);
    $docs->rewind();

    if (!$docs->valid()) {
        $docs->next();
    }

    // ... do stuff with $doc here ...
}
rmarscher
  • 5,596
  • 2
  • 28
  • 30
  • unfortunately, you're code didn't help very much. Haven't tried a timer function yet but I think I'll just pass and use mysql with grails instead. It's sad, since I wanted to use MongoDB. But there is just not enough supportive frameworks out there yet. – Ms01 Oct 08 '12 at 20:54
0

I just solved an issue where a page was taking us more than 65 seconds to load. Turns out the user record for this particular user had an array with 152 records, and each array item was very big, so probably this account exceeded the mongodb record limit of 65,000 characters. When I deleted the large array from the user account, suddenly the page is loading at 4.5 seconds.

The thing is -- the content on the page that was being loaded was unrelated to this user record, so we were working with the queries for that content to try and speed it up. Then we find out the bug is completely unrelated to all of that, and it was due to this other issue.

So make sure your records don't get too big.

codemonkey613
  • 960
  • 1
  • 16
  • 26