3

I have one multidimensional OLAP Cube for Sales.But I have huge database.Firstly I done olap cube full processing .But everytime it will do full so it will take lots of time.I have to do incremental processing.But I dont have any expirence for this subject.Can you help me ?

How should I follow a path? I found some articles deal with this subject for example this one.

But I dont know what will I write to where condition in partition query.

enter image description here

iehrlich
  • 3,572
  • 4
  • 34
  • 43
Sezer Erdogan
  • 167
  • 1
  • 9
  • 34

1 Answers1

3

A typical way to partition fact tables is by date. You have a Sales_TransactionDate column in your source, so this would be an obvious choice as a partitioning attribbute.

Depending on your volume of data, and thus the number of partitions you want to create, you could partition by Year, Month, Day or anything in between.

The idea is that you'd process the entire cube, just once. Then, every (for example) night, you re-process just the partition for the current (for example) month. This works, only if it's true that older data (i.e. data up to the end of last (for example) month) never changes in the source system. If it did change, you'd miss the changes, because last month's partition is no longer being processed.

So this is an important gotcha to incremental processing. You have to know how long after it first appears data in the source system can ever be changed (obviously, only changes which are relevant to the cube matter - if some column which the cube doesn't use changes, that doesn't matter), and at what stage it settles down into an unchanging state.

It's an ETL question, related to how (if at all) you're using Type2 slowly-changing attributes, and to whether the source system has any indication of when a row was last updated (e.g. a LastUpdated datetime column).

(Edit - according to comment below)

You need to adjust your partition size so that you're guaranteed to capture all possible changes just by processing the most recent partition. For example, if a row can change up to 6 months after its Transaction Date (or whatever date you're using to partition by), you have to process the last 6 months of data to not miss any changes.

But this constraint only affects the size of the most recent partition - older partitions can be sized as you like. You can reduce the amount of processing in the most recent partition, if there's a mechanism in the source system to mark rows as "changed". (One example is a "LastUpdated" column, which is always set to the current date/time when the row is updated. Another one is SQL CDC).

SebTHU
  • 1,385
  • 2
  • 11
  • 22
  • Hi SebTHU,can you give another suggestion for incremental olap cube ? – Sezer Erdogan Apr 24 '17 at 14:48
  • I need one example – Sezer Erdogan Apr 25 '17 at 11:56
  • 1
    @SebTHU - To clarify, LastUpdated should only be used once it becomes static. If LastUpdated changes after the cube is processed, the old partition will need to be processed again with the new partition to ensure the record doesn't get counted twice. – Wes H Apr 26 '17 at 14:26
  • @WesH: you're right, that last paragraph was confused. Edited the answer to reflect your point. – SebTHU Apr 28 '17 at 09:37