0

I'm looking into moving data from and OLTP to an OLAP to help speed up analytics. I've been researching my current question for 2 days now and have not found a definitive answer anywhere. Currently I have a highly normalized relational schema and I'm having trouble figuring out how to retain hierarchical data when converting the into the fact tables and dimension tables of a star schema. Is there a way to make fact tables completely granular and normalized while maintaining the values of the parent and child tables taken from a relational schema? I'm not sure if I asked that correctly, so here is an example: I'm recording data from fishing trips. I have to record data from the trip and the catch so I store that data into two main tables (below in simplified versions):

Trip(tripid, date, gear quantity, hours fished, participants)

Catch(tripid, species, disposition, count)

How would I create a fact table with this and retain the gear quantity, hours fished, participants, and species counts all while keeping the fact table normalized? The only way I can think of making the fact table would be the following:

Fact(date, species, disposition, gear quantity, hours fished, participants, count)

With the date, species, and disposition being FKs to the dimension tables, and the rest of the columns being quantifiable values. However, this would be completely denormalized. Is this an appropriate way of doing this, or is there another way to keep the relation between the parent and child tables with quantifiable values and have this in a star schema?

RoofusEat
  • 99
  • 1
  • 7
  • 3
    How many species do you have? Does your analytics really need to maintain the relationship between Trip and Catch? Can you give an example of how you intend to display this data in a report? Often times, the final reporting layout is the best way to drive decisions about how to denormalize data for OLAP. – Tab Alleman Mar 22 '18 at 14:58
  • It's not so much that it's denormalized. It's reorganized. What you suggest is the right approach – user1443098 Mar 22 '18 at 15:04
  • Your Catch would be a degenerate dimension and it's OK to have one. Remember that your database schema is highly influenced by your non functional requirementes (performance, availability, etc.). So @TabAlleman's comment is really important. https://en.wikipedia.org/wiki/Degenerate_dimension – EzLo Mar 22 '18 at 15:16
  • at first glans I would say that your Fact table could be just a view with an left join between your 2 tables. Why do you need this in a seperate table ? Maybe it would help if you posted some sample data and expected outcome – GuidoG Mar 22 '18 at 15:19
  • @TabAlleman The species numbers vary per trip. I've seen up to 30 species in one trip, but the average is about 5 different species per trip and species is a common aggregate. The final reporting layout varies quite a bit. Each day the requests vary in any way imaginable. One example that would use this format is year, species, sum(hours fished), sum(count). – RoofusEat Mar 22 '18 at 18:15
  • @EzequielLópezPetrucci In reality the fact table would have >1 degenerate dimension. For instance one of the values associated with species would be price, which is queried often. Or is this considered part of the degenerate dimension since it has a functional relationship with the species? – RoofusEat Mar 22 '18 at 18:21
  • @GuidoG A left join between the two tables would work, but I'm trying to get better performance via a fact table with a columnstore index. I think I'm just having trouble grasping the idea of a star schema versus a 3NF relational schema. The outcomes would vary, and the example was shortened as there would be many other aggregates to use and even more child tables, but the general outcomes would be something along the lines of year, species, sum(DISTINCT hours fished), sum(DISTINCT participants), SUM(count). – RoofusEat Mar 22 '18 at 18:28
  • but maintaining that fact table when anything changes in either of these 2 tables also has a great cost, in work and in performance.So its not sure yet if you gain much going this way – GuidoG Mar 23 '18 at 07:33

1 Answers1

0
Fact(tripid, species, disposition, count)

Dim Trip(date, gear quantity, hours fished)
    Dim Gear(gear quantity) -- not necessary, really/possibly)
    Dim Participants(partipicant)

That should work, without knowing the details of the relationships.

You also say in a comment: '... I'm trying to get better performance...'. I would urge you not to do that until you have a workable schema and can then actually measure the performance you get. Designing up front for performance problems is known as Premature Optimisation and is usually not a good thing.

simon at rcl
  • 7,326
  • 1
  • 17
  • 24
  • Thanks! Is it normal to have a quantifiable value in the dimension table? If so, then I think that might be where my confusion is coming from. As for getting better performance, this is just brain storming for the future. I already have the OLTP schema created running and was just curious about getting this into a data warehouse in the future for quicker analyzing. – RoofusEat Mar 22 '18 at 19:42
  • Hi @RoofusEat - no, it's not normal! I misunderstood the relationship. If gear qty is something to also be aggregated/averaged/whatever then it should be in the fact table: it looks like there should be a 1:1 relationship with the fact table so it should cause no problems. I was assuming that it was a different type of data: something to aggregate by, not be aggragated; if that's wrong then yes, it should be in he fact table. Cheers - – simon at rcl Mar 25 '18 at 14:46