How to improve cubejs preagregations creation process? (its taking too long to build preaggs even with partitionGranularity)

Question

We are having trouble with the preaggregations creation performance. We currently have specific filters for the data for each one of our clients, and we generate different cubes for each one of them by extending a base cube (called Metrics) and defining a segment that represents those filters.

To summarize, we have a Metrics base cube, and we generate dynamic cubes MetricsA, MetricsB, MetricsC for clients A, B, C. Each one of these cubes has a segment that we call z, which contains a specific SQL query for each of our clients. The data to build the segment is retrieved from our API using asyncModule, and then we extend the Metrics cube to generate all the clients specific cubes by overriding the z segment with the client's filter. By doing this, when a client queries the cube service, the data retrieved will come from their specific cube, with the data already filtered (by the enforced z segment).

This Metrics cube is built by joining large tables, so we also added a partitionGranularity (monthly) to reduce the size of the preaggregations, but they are still taking too long to build (> 10 minutes).
We need to edit the specific query that the cube service submits to create the preaggregation tables, so we only keep the rows with the z segment = 1 (because that is the relevant data), or at least we want to be able to rearrange/modify the query to improve performance. Which is the best place to do such changes? or what is the recommended practice to intervene this process?

score 0 · Answer 1 · answered Feb 17 '20 at 20:15

There're two approaches you can use to leverage pre-aggregations in multi-tenant environments.

Override sql for each customer cube such as OrdersC1, OrdersC2, etc. In this case all pre-aggregations defined in base Orders cube will be inherited. Each customer cube will have it's own set of pre-aggregations. It means if there're N customers and M pre-aggregations then N * M pre-aggregation tables should be built which can be costly in some scenarios.

cube(`Orders`, {
  sql: `SELECT * FROM orders`,

  preAggregations: {
    date: {
      type: `rollup`,
      measureReferences: [someMeasure],
      dimensionReferences: [someDimension],
      timeDimensionReference: date,
      granularity: `month`
    },
    // ...
  }
});

cube(`OrdersC1`, {
  extends: Orders,
  sql: `SELECT * FROM orders WHERE customer_id = 'C1'`,
});

Use tenant field as a dimension of rollup. Every segment can be converted to dimension which provides an opportunity to use single rollup table for all customers. To route requests to right tenant data queryTransformer can be used.

cube(`Orders`, {
  sql: `SELECT * FROM orders`,

  // ...

  dimensions: {
    // ...

    customerId: {
      sql: `customer_id`,
      type: `string`
    }
  },

  preAggregations: {
    date: {
      type: `rollup`,
      measureReferences: [someMeasure],
      dimensionReferences: [customerId, someDimension],
      timeDimensionReference: date,
      granularity: `month`
    },

    // ...
  }
});

How to improve cubejs preagregations creation process? (its taking too long to build preaggs even with partitionGranularity)

1 Answers1