[SSAS CUBE Partition Strategy for large historical data]

Question

I have a big cube with 2.5 million new data per day. 19 million a week. Those data are historical data, no update , no remove and no change.So what's best partition strategy for this kind of data ? You can see only one week there are a lot of data. Shall I create a new partition everyday to process new data and merge into a static large partition at night ?

what are you trying to accomplish? ...faster processing? ...better query performance (via partition elimination)? — Bill Anton, Feb 03 '15 at 22:13
Need a best practice, yes faster processing and query performance — ewan, Feb 04 '15 at 03:46

score 1 · Accepted Answer · answered Feb 03 '15 at 18:53

1

I think the best solution is to use different ranges:

(Date) -> (Partition)
This week -> Daily (this helps not to reprocess all week and not to use tricks with ProcessAdd and only new data)
This year -> Weekly (53 partitions is ok)
Previous years -> Yearly

At the end of each week you can merge daily partitions. 19 millions per one partition is good, but using weekly basis for older years may cause additional time for querying and processing.

So you'll have less than 100 partitions for entire measure group at least for the nearest 40 years (7 daily + 53 weekly + 40 yearly).

And don't forget to add slices to every created partition.

Removing unnecessary indexes (e.g. for high-selective attributes used as properties etc.) may also help to speed up process time and decrease disk space usage.

answered Feb 03 '15 at 18:53

Alex Peshik

1,515
2
15
20

thank you ! for yearly partition, there will be tons of data, will that be a problem for a cube partition? – ewan Feb 04 '15 at 03:08
It depends how often this historical partitions will be used. If no new data is inserted to them, and rare querying them, it's possible to process all dimensions as 'Process Update' and than 'Process Indexes' to large partitions at the very end of ETL. Or even AFTER reporting users that they can browse the cube. Their queries will be slower fo the time w/o indexes on yearly partitions, but after processing it will be OK as regular partitions. I had the same year's partitions with 2-3 billions of rows, everything was fine. – Alex Peshik Feb 04 '15 at 08:10
But it's better to process them in 'Process Full' mode from time to time (every week/month) to decrease disk space usage. Because every Update/Index process adds new rows and doesn't care about using space inside index. – Alex Peshik Feb 04 '15 at 08:15
but we do partition just want to avoid the "Process Full" on large data, that will take some time to process full on yearly data, right ? – ewan Feb 04 '15 at 08:49
One more question, regarding merging daily to weekly, shall I create a empty weekly partition to merge all daily partitions ? I tried on the AdventureDW cube, used target partition to merge select partition, it won't create a new partition, just moved all data into target partition and removed that selected partition. – ewan Feb 04 '15 at 08:54
Yes, one of the partition benefits is avoiding Process Full, but dimensions are changed from time to time and Process Update causes removing ALL dependent indexes from every partition. That is the reason of using Process Indexes w/o Process Data again on historical partitions. It's acceptable not to do periodical Process Full, but disk space consumption will grow and queries speed decrease. – Alex Peshik Feb 04 '15 at 09:04
Merging. Me personally don't like any merging process except creating brand new partition with new date range, process it and only on success - remove old ones. It's safe and easy rolled back if some issues appear. – Alex Peshik Feb 04 '15 at 09:07
@ Alex Peshik : thank you very much. I learned a lot from you answers. thank you – ewan Feb 09 '15 at 07:34

[SSAS CUBE Partition Strategy for large historical data]

1 Answers1