0

I was taking a look at Star Schema Benchmark and then I was thinking is it possible to denormalize all tables from the SSB?

So database size will increase a lot but potentially the performance will grow up. Is that right? Is it possible?

Thanks and sorry for my poor English.

APC
  • 144,005
  • 19
  • 170
  • 281
p.magalhaes
  • 7,595
  • 10
  • 53
  • 108
  • IMHO, if your having to denormalize large parts of your database for speed, you should probably look for a better solution in NoSQL. – Kendall Hopkins Apr 09 '10 at 06:51
  • @KendallHopkins - really? And what NoSQL product would you recommend for implementing a data warehouse which has fact tables containing 100m rows? – APC Apr 09 '10 at 07:06
  • @APC depending on your need for complex queries, Cassandra (http://cassandra.apache.org/) might be a good fit. It scales unbelievably well, but it's very different from traditional relational databases. It's also possible to solve the scaling issue w/ clever sharding and distributed queries. – Kendall Hopkins Apr 09 '10 at 12:50
  • @KendallHopkins - the purpose of a data warehouse is to take a large amount of data and join it in a multitude of different ways, so as to answer complex, frequently ad hoc, queries. Therefore - IMHO - Cassandra or any other product which relies on distribution or sharding is completely inappropriate for the task. – APC Apr 09 '10 at 13:14
  • @APC if your data can be sharded in such a way that could allow a query to be run on each shard and then unioned together (ie the entrees don't heavily cross referenced each other). But if your looking for a very abstract level of data warehousing, your right, they would be inappropriate. – Kendall Hopkins Apr 09 '10 at 15:50
  • @KendallHopkins - I'm running out of ways to say this differently so this will be my last comment here. The reason why we use star schemas and multi-dimensional OLAP cubes is precisely because there is more than one way of looking at the data, and more than one way of aggregating it. Plus our users need the capability to run any query which pops into their heads. I don't think that is a "very abstract level of data warehousing": it's just what data warehouses *do*. – APC Apr 09 '10 at 16:56

2 Answers2

0

In most cases less relationships (joins) means higher performance. Usually denormalization is good in small portions and in balance with normalization. It is the balance between speed and good design. For example Data Warehouses are composed of very flat denormalized tables so they can increase the performance. The idea of flat databases is further developed in the Couch DB for example.

Ivo Sabev
  • 5,230
  • 1
  • 26
  • 38