-1

I am going through the exercise posted below. I don't quite understand how to represent the 4th dimension on paper. If it was just customer, date, product that will be straightforward. Do I have to rollup on something in order to get the fourth dimension?

I was going to Drill down to product-cosmetics, dice on time (to get everything after Jan 2010) and slice on cosmetics.

Suppose a market shopping data warehouse consists of four dimensions: customer, date, product, and store, and two measures: count, and avg sales, where avg sales stores the real sales in pounds at the lowest level but the corresponding average sales at other levels.

Starting with the base cuboid [customer, date, product, store], what specific OLAP operations (e.g., roll-up student to department (level)) should be performed in order to list the average sales of each cosmetic product since January 2010? Explain your answer.

Edoardo Moreni
  • 408
  • 2
  • 10
  • 24
  • That is a terribly worded exercise question. What is the "average" in "average sales" to be calculated over - month? store? However, your own question confuses me too - what do you mean "represent the fourth dimension on paper"? This isn't about visually representing the multi-dimensional space described, it's about the steps to arrive at the correct value. For example, "customer" should probably be rolled up to "All customers", since the question doesn't specify any other grouping. – Nathan Griffiths Jan 18 '16 at 00:55

1 Answers1

1

Don't confuse the dimensions in a star schema with dimensions in space and time. Lay this one out visually like a clock, where your fact table is the center of the dial, and your dimensions are at 12, 3, 6 and 9 o'clock.

The star schema has presumably one fact (sales), the grain of which is not defined, but could reasonably be assumed to be day.

The question is imprecise, in that it has not defined the period over which "average" is to be calculated. We could count average sales per day, but since roll-up has been mentioned, and we've been given a month/year starting point, let's assume it is average MONTHLY sales.

Let's also assume that "cosmetics" represents a group of products, and we've got that defined in the product dimension.

Our date dimension could reasonably carry an attribute of year-month. It is common to see this.

An SQL query might then look like this:

select    product.name,calendar.year_month,avg(sales.amount)
from      sales
          left join product 
              on product.id = sales.product_id
          left join calendar 
              on calendar.id = sales.calendar_id
where     calendar.year_month > '2010-01'
          and product.category = 'cosmetics'
group by  product.name,calendar.year_month

Because you've been asked for OLAP operations, the syntax will vary a lot by product. Conceptually your idea was correct, slice on time (year-month) and product (category), and average sales. You can ignore the customer and store dimensions for this query.

Does that make sense?

Ron Dunn
  • 2,971
  • 20
  • 27
  • Hi Ron, thanks. I have always seen a 3D Cuboid represented on a Cartesian axis, that's why I was wondering how to get the fourth dimension. I have also edited my question, I had omitted the first bit. When it says "lowest level", does it mean the lowest level of the cuboid? Even in this case, it's not that clear. – Edoardo Moreni Jan 17 '16 at 20:41
  • Yes, Edoardo, "lowest level" in this case means the individual sales transactions. When the question asks about "other levels", it means that the aggregate information is being stored in the cube, and the "average sales" figure is also being averaged. I guess there is some academic merit to the question, but it is TERRIBLE real-world design :) – Ron Dunn Jan 18 '16 at 05:21
  • Ok, thanks. It makes sense, so can I store more than a single measure (like avg sales and count) in a cell of the cuboid (basically my fact table)? – Edoardo Moreni Jan 18 '16 at 11:30