0

Oracle gurus,

We are at a point to decide the best approach to design a table with 500 columns wide vs a table with 8 columns wide but 4 billion rows deep. The table is going to be updated once a week, every Sunday with addition of a new week's (latest week in the past) data to the table. Since the data varies by the week number (fiscal), we have two groups of thoughts for the pros and cons on the above design -

For the wide table - the thought is to design a table containing a 3-attribute columns for each week number all the way back for 160 weeks in the past. So that gives us a 160 x 3 = 480 columns wide. The idea is that every week when we add the data for the last week to the table, we will drop the oldest week columns from the table and will add the latest week columns to the table. This table will have approximately 40 million rows in it based on keys defined on ColA - ColD (pls refer to the picture below). Here's the example -

Wide table view

For the deep table - the ColA - ColD fields remain the same, except the fact that there is a new week column that varies by the key defined on ColA-ColD. When we build this table the idea is to stick only the latest week to the table with the appropriate week number and to have a separate purge (maintenance) process to remove the oldest week rows from the table. This table will have approximately 4 billion rows and 8 columns wide. Here's a sample on how it could look like -

Deep table view

We absolutely do understand the need to do partition here by the weeknumbers to either of the tables, whichever ones we pick. Use of the table - The table is going to be queried multiple times by concurrent users for a matching week number and ColA values for past 52 weeks and the expectation is to create a report out of it in less than 5 mins. I'am seeking advice from the Oracle Gurus here, whether you have in your experience seen a table as wide as having nearly 500 columns with the dropping or adding of columns every week as we build data to the table and how does it impact the performance for a highly concurrent report generation tool. Conversely, if you have worked with a table as deep as having 4 billion rows (but the columns do not change on a weekly basis) and what are the performance implications of a concurrent reporting process using this table.

Thank You and much appreciate for your time!! Brendon

APC
  • 144,005
  • 19
  • 170
  • 281
Brendon
  • 57
  • 1
  • 7
  • 1
    From my point of view, I'd rather have a [8-columns table x 4 billion rows]. Don't have experience with that much data, though, but I reckon that it is much easier to write queries that deal with limited number of columns (8, right?) and *tune* them, than virtually die in taking care whether I selected right 3-4 columns out of 500 of them. Looks like a nightmare to me. – Littlefoot Mar 27 '18 at 18:01
  • 2
    500 columns. No. Never. Never ever. What happens when they want 180 weeks in the past. You have to change your table and make a 600+ column table? The same change in your 8 column table... Nothing but more records. Grow in records, not in columns. Partition and index to make it quick.4 Billion is a lot, but not it's not THAT big. – JNevill Mar 27 '18 at 18:08
  • 1
    And you are suggesting to drop a column and add a new column each week for the wide table. Lord no. That's crazy talk. Build your objects/schema once (database, tables, columns) and NEVER change them again. That's what you should be aiming for. – JNevill Mar 27 '18 at 18:10
  • @JNevill thanks, I totally get it and am aligned to it completely. Thats what my thoughts too, but I wanted to do a due diligence to understand whether the wide table logic has any merits from a performance standpoint. Particularly when we want to run rigorous week selection queries against this table every time with a good degree of concurrency (more than 10 users at the same time). Thanks!! – Brendon Mar 27 '18 at 18:21
  • 1
    Partitioning by week will mean that when folks select against this huge table, they will really only be hitting records that are in that partition. Meaning the actual query is only hitting 1/160th of the 4 billion, or about 25 million records. Proper indexing for whatever query you will be running against this will really help. This is essentially what every RDBMS is designed to do. Query huge "deep" tables quickly and allow tuning to achieve that. They are not designed to have their columns flipped in and out on the regular and provide speedy retrieval across really wide selections. – JNevill Mar 27 '18 at 18:25
  • Well.. wide selections depends on the RDBMS product. Some RDBMS have columnar partitioning which would help, but again, rebuilding the partitioning every time you swap columns would be a nightmare. – JNevill Mar 27 '18 at 18:26
  • “Proper indexing” noted above may in fact mean no indexes ! – BobC Mar 31 '18 at 22:54

1 Answers1

3

You want a table with a consistent projection. That means the eight columns, four billion row configuration.

Dropping columns is an expensive task in and off itself. Beyond that you will need to change all the code which references the table every week, which doesn't seem like a good idea. The alternative would be to use dynamic SQL for every call on this table, which is even more undesirable.

With four billion rows you definitely should buy the Partitioning option. Your queries will benefit from partition pruning, assuming most of your queries use WeekNumber. But the ability to load data through Partition Exchange and remove it with Drop Partition are invaluable in wrangling large amounts of data.

APC
  • 144,005
  • 19
  • 170
  • 281