0

I have a problem where I have large database that take very long time to access partly because its normalized and requires a lot of joins (its very similar to start schema right now). However, I don't care about write time/potential write anomalies and need the database to be fast for analysis Can someone please point me to good text for database schemas that are well suited for analytics and not so much for online performance?

E.g. tables: 1 table has static product information and table 2 has instances of all the times the product was bought/viewed/... certain n actions done on it, and cost payed for each such action. A lot of my work involves finding certain products on which certain actions were done.

Thanks.

0fnt
  • 8,211
  • 9
  • 45
  • 62
  • Would anyone suggest creating one table for each product, that contains all its operations/amounts? – 0fnt Feb 09 '12 at 20:07

2 Answers2

1

"Normalized" doesn't always mean "requires lots of joins". But using surrogate keys (id numbers) usually does.

Build a test database, normalizing to 5NF without using id numbers for anything that has a natural key. So, for example, no id numbers for countries (use the ISO code), states, ZIP codes, categories, etc.

This kind of structure implements a space/time tradeoff. Up to a certain point, and under certain conditions, natural keys will perform faster than surrogates, because often the "key" information (cough) is carried in the natural key. So you won't need a join to get to it. But there comes a point where surrogate keys are faster, because they require less I/O, and more rows will fit in a page. You need to test to determine where that point is, and whether you can live with it.

Mike Sherrill 'Cat Recall'
  • 91,602
  • 17
  • 122
  • 185
0

Keep your schema as it is - ie "correct", but index fully denormalized (ie ready to render) versions of your data in a text search engine, for example lucene.

Text search engines do not offer update or join capabilities, but they are FAST! You won't believe how fast until to see it for yourself, but it will be in the order of couple of milliseconds.

Text search engines are so fast because they use an inverted index. I've implemented them a few times, and it's always been well worth it. All you need to do is make sure every time your real data changes, you re-index whatever is affected in your text search engine.

Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • Thanks, indeed a very nice solution. Will try it out. However, it wouldn't consider the fact that some fields are sorted. Also, not to nitpick, and definitely not being ungrateful, but saying couple of milli-seconds without saying anything about the data-size, query type etc doesn't seem very correct! Thanks a lot for the answer. People please keep the answers coming. – 0fnt Feb 09 '12 at 20:05
  • Also, I have a lot of numbers- doubles, would it play nice? Also, can you briefly tell me how would it be faster than a db that has numeric fields also indexed for fast comparison? – 0fnt Feb 09 '12 at 20:17
  • A couple of milliseconds is pretty much a standard response time for any normal query. Regarding doubles, if you're using java, check out [solr](http://lucene.apache.org/solr/) - it's a java layer on top of lucene, which allows you to save and retrieve basic java types into the fields. Searching can be on exact, partial and ranges for all data types. There's also direct support for lat/long area searching. It's truly awesome. – Bohemian Feb 09 '12 at 22:12