0

** Quick summary: C++ app loading data from SQL server using using OTL4, writing to Mongo using mongocxx bulk_write, the strings seem to getting mangled somehow so they don't work in the aggregation pipeline (but appear fine otherwise). **

I have a simple Mongo collection which doesn't seem to behave as expected with an aggregation pipeline when I'm projecting multiple fields. It's a trivial document, no nesting, fields are just doubles and strings.

First 2 queries work as expected:

> db.TemporaryData.aggregate( [ { $project :  {  ParametersId:1 } } ] )
{ "_id" : ObjectId("5c28f751a531251fd0007c72"), "ParametersId" : 526988617 }
{ "_id" : ObjectId("5c28f751a531251fd0007c73"), "ParametersId" : 526988617 }
{ "_id" : ObjectId("5c28f751a531251fd0007c74"), "ParametersId" : 526988617 }
{ "_id" : ObjectId("5c28f751a531251fd0007c75"), "ParametersId" : 526988617 }
{ "_id" : ObjectId("5c28f751a531251fd0007c76"), "ParametersId" : 526988617 }

> db.TemporaryData.aggregate( [ { $project :  {  Col1:1 } } ] )
{ "_id" : ObjectId("5c28f751a531251fd0007c72"), "Col1" : 575 }
{ "_id" : ObjectId("5c28f751a531251fd0007c73"), "Col1" : 579 }
{ "_id" : ObjectId("5c28f751a531251fd0007c74"), "Col1" : 616 }
{ "_id" : ObjectId("5c28f751a531251fd0007c75"), "Col1" : 617 }
{ "_id" : ObjectId("5c28f751a531251fd0007c76"), "Col1" : 622 }

But then combining doesn't return both the fields as expected.

> db.TemporaryData.aggregate( [ { $project :  {  ParametersId:1, Col1:1 } } ] )
{ "_id" : ObjectId("5c28f751a531251fd0007c72"), "ParametersId" : 526988617 }
{ "_id" : ObjectId("5c28f751a531251fd0007c73"), "ParametersId" : 526988617 }
{ "_id" : ObjectId("5c28f751a531251fd0007c74"), "ParametersId" : 526988617 }
{ "_id" : ObjectId("5c28f751a531251fd0007c75"), "ParametersId" : 526988617 }
{ "_id" : ObjectId("5c28f751a531251fd0007c76"), "ParametersId" : 526988617 }

It seems to be specific to the ParametersId field, for instance if I choose 2 other fields it's OK.

> db.TemporaryData.aggregate( [ { $project :  {  Col1:1, Col2:1 } } ] )
{ "_id" : ObjectId("5c28f751a531251fd0007c72"), "Col1" : 575, "Col2" : "1101-2" }
{ "_id" : ObjectId("5c28f751a531251fd0007c73"), "Col1" : 579, "Col2" : "1103-2" }
{ "_id" : ObjectId("5c28f751a531251fd0007c74"), "Col1" : 616, "Col2" : "1300-3" }
{ "_id" : ObjectId("5c28f751a531251fd0007c75"), "Col1" : 617, "Col2" : "1300-3" }
{ "_id" : ObjectId("5c28f751a531251fd0007c76"), "Col1" : 622, "Col2" : "1400-3" }

For some reason when I include ParametersId field, all hell breaks loose in the pipeline:

> db.TemporaryData.aggregate( [ { $project :  {  ParametersId:1, Col2:1, Col1:1, Col3:1 } } ] )
{ "_id" : ObjectId("5c28f751a531251fd0007c72"), "ParametersId" : 526988617, "Col1" : 575 }
{ "_id" : ObjectId("5c28f751a531251fd0007c73"), "ParametersId" : 526988617, "Col1" : 579 }
{ "_id" : ObjectId("5c28f751a531251fd0007c74"), "ParametersId" : 526988617, "Col1" : 616 }
{ "_id" : ObjectId("5c28f751a531251fd0007c75"), "ParametersId" : 526988617, "Col1" : 617 }
{ "_id" : ObjectId("5c28f751a531251fd0007c76"), "ParametersId" : 526988617, "Col1" : 622 }

DB version and the data:

> db.version()
4.0.2
> db.TemporaryData.find()
{ "_id" : ObjectId("5c28f751a531251fd0007c72"), "CellId" : 998909269, "ParametersId" : 526988617, "Order" : 1, "Col1" : 575, "Col2" : "1101-2", "Col3" : "CHF" }
{ "_id" : ObjectId("5c28f751a531251fd0007c73"), "CellId" : 998909269, "ParametersId" : 526988617, "Order" : 1, "Col1" : 579, "Col2" : "1103-2", "Col3" : "CHF" }
{ "_id" : ObjectId("5c28f751a531251fd0007c74"), "CellId" : 998909269, "ParametersId" : 526988617, "Order" : 1, "Col1" : 616, "Col2" : "1300-3", "Col3" : "CHF" }
{ "_id" : ObjectId("5c28f751a531251fd0007c75"), "CellId" : 998909269, "ParametersId" : 526988617, "Order" : 36, "Col1" : 617, "Col2" : "1300-3", "Col3" : "CHF" }
{ "_id" : ObjectId("5c28f751a531251fd0007c76"), "CellId" : 998909269, "ParametersId" : 526988617, "Order" : 1, "Col1" : 622, "Col2" : "1400-3", "Col3" : "CHF" }

Update: enquoting the field names makes no difference. I'm typing all the above in the mongo.exe command line, but I see the same behavior in my C++ application with a slightly more complex pipeline (projecting all fields to guarantee order).

This same app is actually creating the data in the first place - does anyone know anything which can go wrong? All using the mongocxx lib.

** update **

Turns out there's something going wrong with my handling of strings. Without the string fields in the data, it's all fine. So I've knackered my strings, somehow, even though they look and behave correctly in other ways they don't play nice with the aggregation pipeline. I'm using mongocxx::collection.bulk_write to write standard std::strings which are being loaded from sql server through the OTL4 header. In-between there's a strncpy_s when they get stored internally. I can't seem to create a simple reproducible example.

Mark Terry
  • 51
  • 1
  • 4

2 Answers2

0

Just to be safe that there is no conflict with anything else, try using the projection with a strict formatted json: (add quotes to keys)

db.TemporaryData.aggregate( [ { $project :  {  "ParametersId":1, "Col1":1 } } ] )
Nuno Sousa
  • 832
  • 7
  • 15
  • Thanks - I just tried that but makes no difference. Will update question. – Mark Terry Dec 31 '18 at 10:17
  • I actually tried this in 3.6, and I do not get any problems with projection. I used same data, queries with quotes. Everything works as it's supposed to. :( – Nuno Sousa Dec 31 '18 at 22:22
0

Finally found the issue was corrupt documents, which because I was using bulk_write for the insert were getting into the database but causing this strange behavior. I switched to using insert_many, which threw up the document was corrupt, and then I could track down the bug.

The docs were corrupt because I was writing the same field-value data multiple times, which seems to be break the bsoncxx::builder::stream::document I was using to construct them.

Mark Terry
  • 51
  • 1
  • 4