1

I'm using cube.js with MongoDB through MongoDB Connector for BI and MongoBI Driver and so far so good. I'd like to have a cube.js numerical measure that counts the element length from a MongoDB array of object nested property. Something like:

{
  "nested": {
    "arrayPropertyName": [
      {
        "name": "Leatha Bauch",
        "email": "Leatha.Bauch76@hotmail.com"
      },
      {
        "name": "Pedro Hermiston",
        "email": "Pedro76@hotmail.com"
      }
    ]
  }
}

I wasn't able to figure that out looking at the docs and I was wondering if that is even possible.

I tried with type: count:

    MyNestedArrayPropertyCounter: {
      sql: `${CUBE}.\`nested.arrayPropertyName\``,
      type: `count`,
      format: `number`,
    },

but I'm getting

Error: Error: Unknown column 'nested.arrayPropertyName' in 'field list'

Any help/advice is really appreciated. Thanks

Luca Anceschi
  • 2,257
  • 3
  • 19
  • 24

2 Answers2

1

BI treats nested arrays as separate relational tables. See https://www.mongodb.com/blog/post/introducing-the-mongodb-connector-for-bi-20

That's why you get unknown column error, it's not part of the parent document table.

So my guess you have to build schema on the nested array and then build measure count with dimension on parent object id.

Hope it halps.

Michael Parshin
  • 345
  • 2
  • 5
0

I followed Michael Parshin's advice and here's my findings and outcomes to overcome the problem:

  1. LEFT JOIN approach with cube.js joins. I found it painfully slow and most of the time it endend out in a timeout even when querying was performed through command line SQL clients;

  2. Launch mongosqld with --prejoin flag. That was a better option since mongosqld automatically adds master table columns/properties to the secondary tables thus enabling you to conveniently query cube.js measures without joining a secondary Cube;

  3. Wrote a mongo script that fetch/iterate/precalc and persist nested.arrayPropertyName count in a separate property of the collection documents.

Conclusion

Leaving out option 1, option 3 significantly outperforms option 2, typically less than a seconds against more than 20 seconds on my local machine. I compared both options with the same measure, different timeDimension ranges and granularity.

Most probably I'll incorporate array count precalculation into mongo document back-end persisting logic.

Luca Anceschi
  • 2,257
  • 3
  • 19
  • 24