I am building a local OLAP cube based on data gathered from several OLTP sources. Please note that I am doing this programmatically and do not have access to tools like SSAS or MDX-based tools.
My requirements are somewhat different than the operational requirements of the OLTP system users. I know that "in theory" it would be preferable to retain the most atomic grain available to me, but I don't see a reason to include the lowest level of data in the cube.
For example (I am simplifying), I have a measure field like "Price". Additionally, each sales fact has a Version attribute with values such as:
- List (Original/Initial)
- Initial Quote
- Adjusted Quote
- Sold
These describe the internal development of our pricing and are critical to the reports that I create.
However, for my reporting purposes, I will always want to know the value of all Versions whenever I am referencing a given transaction. Therefore, I am considering pivoting measures like Price by Version in the cube (Version will still be its own entity in the data model), resulting in measures like:
- PriceList
- PriceQuotedInitial
- PriceQuotedAdjusted
- PriceSold
Since only one Version is ever effective at a given point in time, we do not need to aggregate across multiple Versions.
Known Advantages
Since this will be a local cube file, it appears this approach would simplify the creation of several required calculated measures that compare Price across different Versions (would not be an issue to create calculated measures at various levels of aggregation if I was doing this with MDX)
It would also reduce the number of records by a factor of between 3 and 6, which would significantly boost performance for a local cube.
Known Disadvantages
While the data model will match the business process, the cube would not store the data at the most atomic level. An analyst would need to distinguish between Versions by Measure selection, and could not filter by Version - they would always get all available Versions.
This approach will greatly increase the number of Measures. For example, there is not just one Price we are tracking, but several price components and other Measures we track for each transaction. So if we track a dozen true Measures for each transaction, that might end up being 50-60 Measures if I take this approach.
I understand that for very large Fact tables, it would be preferable to factor all possible fields out of the Fact table into Dimensions for performance purposes, but I am not sure whether this is the case when using a local cube, as in all likelihood, I will put fewer than 50,000 records into any given cube file, given the limitations of local cubes.
Are there other drawbacks to this approach that I'm missing?