0

I'm running an aggregate through PyMongo.

The aggregate, formatted fairly nicely, looks like this:

[{
    $match: {
        syscode: {
            $in: [598.0]
        },
        date: {
            $gte: newDate(1509487200000),
            $lte: newDate(1510264800000)
        }
    }
},
{
    $group: {
        _id: {
            date: "$date",
            start_date: "$start_date",
            end_date: "$end_date",
            daypart: "$daypart",
            network: "$network"
        },
        syscode_data: {
            $push: {
                syscode: "$syscode",
                cpm: "$cpm"
            }
        }
    }
}]

It returns no results when I use the .explode methods on its cursor in Python.

When I run it through NoSQL Booster for MongoDB, I get the results back. That said, the Mongo log files don't change from what I'm seeing when I run it through PyMongo.

When I look at the Mongo logs, there's an additional group by pipeline added to them. Apparently the Booster knows what to do with this and I don't.

{ $group: { _id: null, count: { $sum: 1.0 } } }

This is the full log line I see.

2018-03-11T21:05:04.374+0200 I COMMAND  [conn71] command Customer.weird_stuff command: aggregate { aggregate: "rate_cards", pipeline: [ { $match: { syscode: { $in: [ 598.0 ] }, date: { $gte: new Date(1509487200000), $lte: new Date(1510264800000) } } }, { $group: { _id: { date: "$date", start_date: "$start_date", end_date: "$end_date", daypart: "$daypart", network: "$network" }, syscode_data: { $push: { syscode: "$syscode", cpm: "$cpm" } } } }, { $group: { _id: null, count: { $sum: 1.0 } } } ], cursor: { batchSize: 1000.0 }, $db: "Customer" } planSummary: COLLSCAN keysExamined:0 docsExamined:102900 cursorExhausted:1 numYields:803 nreturned:1 reslen:134 locks:{ Global: { acquireCount: { r: 1610 } }, Database: { acquireCount: { r: 805 } }, Collection: { acquireCount: { r: 805 } } } protocol:op_query 122ms

What's going on? How do I handle this from the Python side?

Notes as I'm digging: this pipeline runs when I get lucky and use an unordered dictionary (default) with Pymongo. When I run the input JSON through the JSON.Jsondecoder with the line:

json.JSONDecoder(object_pairs_hook=OrderedDict).decode(parsed_param) 

the output has a very complex format (necessary due to the pipeline needing to maintain its order) and ends up passing that extra piece.

Dylan Brams
  • 2,089
  • 1
  • 19
  • 36

1 Answers1

0

So, lacking interest I found a workaround. Examining the problem, I found that when I added an additional step to the pipeline ({"$sort": {"_id": 1}}) the translation from Python dictionary to Mongo JSON aggregate didn't generate the extra JSON object.

This is a poor answer, but I think the root cause is that the conversion between complex ordered dictionaries and Mongo JSON queries in this particular environment has a little tiny bug that affected this particular query.

I would be excited to go find it and examine it further, but I'm buried at a new job.

Dylan Brams
  • 2,089
  • 1
  • 19
  • 36