3

I was wondering if you think it's reasonable, using monetdb (or another columnar database) to put all of your data in one big, flat table rather than breaking it up into several related tables.

For example, a database of used cars, flat, might look like:

Make    Model   Year   Color    Mileage
Chevy   Malibu  2009   orange   102100   
Chevy   Malibu  2009   orange   98112
Chevy   Malibu  2008   orange   210232
Chevy   Malibu  2009   pink     150100

Noticing the redundancy in Make-Model-Year-Color, in a SQL database or excel spreadsheet or whatever, you might have two tables like:

mId   Make   Model   Year  Color
1     Chevy  Malibu  2009  orange
2     Chevy  Malibu  2008  orange
3     Chevy  Malibu  2009  pink

mId   Mileage
1     102100   
1     98112
2     210232
3     150100

This helps with the redundancy at the expense of more complex queries and having to think about how to decompose (break up) the tables.

I was reading about columnar databases and monetdb in particular. It seems like, since monetdb compresses columns individually that the redundancy doesn't matter and you could just use the flat table expecting same-or-better performance (query time, disk usage) as a well-decomposed set of relational tables would provide. This saves design effort, but even better lets you completely automate schema design -- by avoiding it.

What do you think? Is there some hidden cost that I'm not seeing?

Kamil Gosciminski
  • 16,547
  • 8
  • 49
  • 72
Michael Fox
  • 3,632
  • 1
  • 17
  • 27
  • SQL Server, Oracle, etc. can all compress data too, often very well. The issue isn't generally size, or even performance, it's the logical errors inevitably introduced by having a model that isn't properly normalized. Column-oriented databases are great - for analysis, not transactional processing. The column compression is a benefit there because star and snowflake schemas are expected to have many columns, each with a fairly narrow range of columns across many rows. But those schemas are very poor choices for OLTP. – Aaronaught Nov 10 '13 at 22:00
  • 1
    Can i have your current view as i am also looking into approach to replace/use star schema design with columnar database. Any helping material you can refer. – irfan Munir Apr 11 '14 at 15:33
  • Columnar databases are created for proper use cases. Those use cases are different from what we have currently in relational normalized databases. We cannot fit everything in columnar databases. Columnar databases are not meant for data subject to frequent and many updates. – v83rahul Mar 06 '18 at 03:17

2 Answers2

0

Seems like you got it right. In my experience Columnar Databases in general and MonetDB particularly deliver extremely fast query times with data structure like you have described. For the example you described, a Columnar database will Encode and Compress each column (naturally containing data of the same type, with many repetitions).

Anyway, if your workload include lot's of updates, benchmark the solution before deciding.

Personally I've seem MonetDB performs much better than most commercial Column Oriented Databases and much-much better then Row oriented or NoSQL, but the bottom line to keep in mind is that every case has it's own behavior.

Shay
  • 21
  • 3
0

What you are describing is (a.f.a.i.k.) called the "unified table approach". Very smart people tried implementing systems around this idea and gave up on it. The latest (unsuccessful) attempt was the IBM DB2 Blink Project (read page 3 of http://homepages.cwi.nl/~idreos/BlinkDebull2012.pdf). The essence: from a query processing point of view, you will generally be better of with normalized schemas rather than having the system figure out your schema for you.

To answer your specific question: MonetDB does not compress data other than strings (and even those only if there are few unique strings). I'd advise you to spend the effort to define a relational schema or switch to a schemaless DBMS if you really cannot. This will, naturally come at a performance penalty.

Holger
  • 1,648
  • 1
  • 16
  • 26