0

I have one fact table that has all information regarding how much a company buys and sells. In order to create som calculation regarding f.ex margin I need to use the rows for a purchase together with rows for sales to get the correct calculations.

Now, I have created a calculated measure that gives me the correct result, but the more dimensions I add to my query the slower the query runs when using this calculated measure. It seems like it is using a lot of time to return the tuples I am using to find the rows regarding purchase.

I am using tuples to "store" the purcase row, but the tuple becomes quite large because I need to include all default members of dimensions used by the sales rows in order for them to be used. Basicly my tuples looks like this just with more dimension hierarchies:

(
        [Dimension 1].[Hierarchy 1].&[member]
        ,[Dimension 1].[Hierarchy 2].&[member]    
        ,[Dimension 2].[Hierarchy 1].&[member]
        ,[Dimension 3].[Hierarchy 1].&[member]
        ,[Dimension 4].[Hierarchy 1].&[member]          
        ,[Measures].[Purchase Standard Cost]
    )

I then multiply this tuple with a measure from the sales rows and I get my result.

Anyone have any tips on how to improve the query performance? The calculation works and if I slice by just a couple of dimensions it works just fine and performance is not to bad, but the more I add the slower it gets and the users will hit performance issues.

Nighty_
  • 545
  • 4
  • 13
  • This is as expected? "more dimensions I add to my query the slower the query runs" – whytheq Oct 14 '15 at 07:34
  • I wont argue about that of course adding dimensions will slow my query down. However, in this case the performance hit is a LOT worse than I expected. When testing it is clear to me that the tuples take a lot of time to compute and I'd like to know if anyone has any experience with big tuples affecting performance this bad. – Nighty_ Oct 14 '15 at 07:48
  • With the amount of information you have provided, it is equivalent to shooting in the dark but you can try two things which generally work. 1) Ensure you are not storing decimal values in your dimensions(Performance can hit rock bottom) 2) Creating user defined hierarchies can be sometimes a great performance boosting option.(http://stackoverflow.com/a/7297969/2993606) – SouravA Oct 14 '15 at 12:49
  • @Nighty_ is it one particular dimension which slows things down, when added to the tuple? – whytheq Oct 14 '15 at 13:18
  • @whytheq I've tried to create different combinations to find out this myself as well, but haven't found any particular dimension to affect performance. None of the dimensions is really that large, I guess the largest is between 200-300 rows which is quite small. But I guess they all add up in the end and that is what we're seeing. – Nighty_ Oct 15 '15 at 05:39

1 Answers1

2

Since amount of used dimensions increased, Storage Engine has to scan additional files, it could be a reason of such performance degradation.

I have several suggestions based on their effectiveness from my point of view:

  1. Implement partitioning (if it's not implemented yet) to scan lower amount of data.
  2. "Materialize" some tuples into physical dimension (if there are no dynamics, late-binding functions etc. in MDX):

    2.1. Add corresponding keys, which represents tuples, to your source tables.

    2.2. Build appropriate dimensions on these keys.

    2.3. Use calculated measures with these "ex-tuples".

Example:

You have a 100M rows table with columns: SomeDate, Customer, Product, Amount and a single-partitioned measure group.

You need to create tuples like (2015-01-01, Customer A, Product Z, Amount).

Server have to scan the entire data to take exact values.

  1. Once you add partitions by SomeDate years (+slices), server will take only 2015 partition.

2.1. Add column Tuple_ID int to the table and map it during ETL.

E.g. Tuple_ID = 1 where Customer = 'Customer A' and Product = 'Product Z'

2.2. Create a dimension on this new field (or on additional table with list of combinations to be able to modify logic easily).

2.3. Use ([Tuple ID].[Tuple ID].&[1],[Measures].[Amount]) in calculation.

Advantage of such technique is that server takes only pre-calculated values, and queries speed as a result.

Alex Peshik
  • 1,515
  • 2
  • 15
  • 20
  • Good to see you here after a long time. – SouravA Oct 14 '15 at 13:23
  • Great suggestions. Thank you! My current project actually does not have a proper ETL so I won't have the opportunity to try this, but I'll take them with me. Please don't ask me why it does not have ETL. I was thrown into this as they needed help and their "ETL" is about 10 layers of views from the source to the cube. – Nighty_ Oct 15 '15 at 05:36
  • 1
    Thanks, Sourav! Glad you and whytheq are here too. @Nighty_, I understand, had the same customer one day. We had to add logic directly to views, this increases processing time on the stage of Process Data, but has no influence further in cube querying. – Alex Peshik Oct 15 '15 at 06:33