2

I'm having some trouble deciding what should go in a particular dimension and what should go in a fact table for a star schema I'm developing.

For the sake of example, let's say that the project is keeping track of houses for a property management company. Dimensions like the various dates, renter, contract, etc. are all fairly straightforward. For the house, no matter where the data lives, we want to keep track of the current owner, the current renter, the current rental contract, as well as things like neighborhood, address, current rental price, current market value, and so forth. Note that owner, renter and contract are themselves dimensions (and neighborhood and address may also be dimensions, but I don't care about those so much).

A lot of the data kept about houses will be used in filtering queries, or for the row and column headers of a cube. Some of it is only needed as ancillary information, looked at on a house by house basis, but not in aggregate.

Given the data, and what I need to do with it, I have (at least) three options:

  • DimHouse: house table is a dimension, with a lot of attributes that might look better in a fact table, but since they are used for browsing and filtering, they need to be here. Snowflaking/outriggers will be required for attributes like current renter.
  • FactHouse: have an accumulating snapshot of house information that is joined to other fact tables, perhaps using a trimmed down DimHouse as a bridge. This seems weird to me, but it puts what appear to be facts in a fact table.
    • Put current owner, current renter, etc. in the relevant fact tables and then keep those facts up to date as the owner/renter/etc. change (also weird, but would keep us in star schema land).

So I've been going down the dimension route. It gives me some heart burn, but it achieves the goal. I just want to know if there is a better way to organize the data. I don't mind redundancy (such as having a fact table and a dimension table with similar data) or snowflaking, if they make sense and are the best way to do things (for values of "best").

siride
  • 200,666
  • 4
  • 41
  • 62

1 Answers1

1

The thing about star schema is that it is purpose-built for making certain kinds of queries easy and efficient.

If you're finding that some kinds of queries aren't being helped along by your star because of what is a dimension and what is a fact, then you build additional stars around the alternative views of dimensions and facts that will more easily support the queries you want to perform.

You keep your transactional database normalized. When it comes to your BI datawarehouse you need to let your redundancy anxiety go to avoid the heart burn.

Joel Brown
  • 14,123
  • 4
  • 52
  • 64
  • The redundancy isn't what's giving me heartburn. What's giving me heartburn is my inability (being a n00b in the data warehouse world) to design a set of fact and dimension tables to achieve the goals for tracking and measuring house-related data. Do you think you could speak more to that, that is, be more specific in your middle paragraph? – siride Jun 05 '12 at 15:06
  • @siride - Without knowing more about why you find your current star schema deficient, it's hard to make specific recommendations. There's absolutely nothing particularly wrong with grouping dimensions into "higher level" dimensions, e.g. houses into neighbourhoods, although lots of people denormalize all of that to make automated cube/pivot tools easier to work with. Can you give an example of a kind of query that your current star doesn't support well? – Joel Brown Jun 05 '12 at 16:30
  • it's a design issue. The question is where to put the data in the first place, not how to fix a broken/inefficient query. It sounds like you are advocating snowflaking the house dimension. I just want to make sure that that's the best I can do with the upfront design. – siride Jun 05 '12 at 16:49
  • @siride - Snowflaking is one option, another option is flattening out the snowflake to star. This makes some kinds of operations a little easier (pivoting) and some a little harder (drilling/rolling). Yet another option is to create an entirely new star which repeats everything but reorganized according to different dimensions and facts. This last one solves the problem of how to look at some data which wants to be a dimension sometimes and a fact at other times. – Joel Brown Jun 05 '12 at 17:56