Oracle 12c - Table with size larger than 5 terabytes

Question

In our database(Oracle 12c, Exadata) we plan to store sales data. Input text files containing sales data comes daily basis(~1000 files every day each containing ~20000 rows). Text files are read and transfered to db asap. Acccording to our calculations it will grow up to 5 terabytes in one year.

Data format: [transaction date][category][sales_number][buyer_id][other columns]

sales data comes in 10 different categories with same fields. Data logically can be stored in just one single table or can be divided into 10 tables (with respect to categories).

What is the best practice for storing such kind of big data in oracle? What kind of partitioning and indexing strategy should be applied?

Constraints: Data should be available for analysis in 2-3 days to marketing department. Queries based on [sales_number] or [category],[buyer_id] or [buyer_id] columns.

Do you mean 1000 rows when you write "~1000 files each day"? — Wernfried Domscheit, Jan 11 '18 at 12:28
1000 files every day, each file containing ~10000 rows, Queries have category in most cases. — fatihn, Jan 11 '18 at 12:33
Update : 1000 files every day, each file containing ~20000 rows, — fatihn, Jan 11 '18 at 12:43
Typically the business date is the partition key. You might subpartition by category but it's hard to say without a detailed analysis. — William Robertson, Jan 11 '18 at 12:44

score 2 · Accepted Answer · answered Jan 11 '18 at 12:49

2

If the number of categories is known and fix then you can use a subpartition for each category. One approach could be this one:

CREATE TABLE SALES_DATA
(
  TRANSACTION_DATE TIMESTAMP(0) NOT NULL,
  CATEGORY NUMBER NOT NULL,
  SALES_NUMBER NUMBER, 
  BUYER_ID NUMBER,
  [OTHER COLUMNS]
)
PARTITION BY RANGE (TRANSACTION_DATE) INTERVAL (INTERVAL '1' DAY)
SUBPARTITION BY LIST (CATEGORY)
SUBPARTITION TEMPLATE 
    (
        SUBPARTITION CAT_1 VALUES (1), 
        SUBPARTITION CAT_2 VALUES (2),
        SUBPARTITION CAT_3_AND_4 VALUES (3,4), 
        SUBPARTITION CAT_5 VALUES (5),
        ...
        SUBPARTITION CAT_10 VALUES (10),
        SUBPARTITION CAT_OTHERS VALUES (DEFAULT)        
    )
    (
        PARTITION P_INITIAL VALUES LESS THAN (TIMESTAMP '2018-01-01 00:00:00')  
    );

Local indexes would be needed on sales_number and buyer_id. You can put every (sub)partition into a separated tablespace if required.

answered Jan 11 '18 at 12:49

Wernfried Domscheit

54,457
9
76
110

With many categories that would make queries on sales_number or buyer_id quite slow as Oracle would have to scan indexes for all partitions in the date range. If no date has been given all partitions must be processed. – ewramner Jan 11 '18 at 14:58
@user2612030, yes that's right but it depends on number of categories. Global index on sales_number would be better. – Wernfried Domscheit Jan 11 '18 at 15:39
Without knowing the queries or the cardinality of the data, how do you know that you even need partitioning or indexes? – BobC Jan 18 '18 at 04:55
@BobC, the OP stated some numbers and type of queries. – Wernfried Domscheit Jan 18 '18 at 06:41
@WernfriedDomscheit Maybe I'm being dumb... where? – BobC Jan 18 '18 at 10:16
@BobC, *"Input text files containing sales data comes daily basis(~1000 files every day each containing ~20000 rows)."* and *"in 10 different categories"* and *"Queries based on [sales_number] or [category],[buyer_id] or [buyer_id] columns."* I think this information justifies my proposal. Maybe not by 100% but it should be a good starting point. – Wernfried Domscheit Jan 18 '18 at 10:35
But that does not really define the query nor the cardinalities. Maybe I'm just being a bit pedantic, since my focus is on performance ;) – BobC Jan 18 '18 at 16:55

Oracle 12c - Table with size larger than 5 terabytes

1 Answers1