0

I have a production table that has 2.5 Billion rows. It has a column "created_date", which has both date and time components. The table is not partitioned. I have to export 2 months of data based on the "created_date" value. My options are either (1) to create a function based index "TRUNC(created_date)" on the column ; or, (2) create a virtual column as "TRUNC(created_date)" and then index that virtual column. Then I run a query-based expdp on this table using that index.

Since there is no way for me to test which approach is better (even slightly better will work because its a production table with no downtime), has anyone any advice on this matter ? My assumption is the having a virtual column will not cause an index calculation operation every time a row is inserted into this table (about 7 Million new rows per day), which will definitely happen for a function-based index. Any advice ? Also, does anyone have any idea how much time such an index creation (on 2.5 Billion rows) may take ?

Jay
  • 47
  • 11
  • 1
    It’s not clear why you need to filter on trunc(date_col) rather than date_col and trunc’ing the date you’re filtering with? Either way, a straight forward query without any new index will be faster than creating the index and then running another query. And no, a virtual column with an index on it will need updating just like any other index (otherwise how can you trust it?) – Andrew Sayer Feb 24 '21 at 20:40
  • @AndrewSayer the reason I am thinking of an index of some sort is because I am not sure how much the export process will take. Probably we have to do multiple runs with smaller date ranges, building up to the 2 month of data extract. As I said, the approaches are still an idea. Just wanted to find out if anyone has had any experience about the time taken to index a 2.5 billion row date column – Jay Feb 24 '21 at 20:53
  • You'll be extracting cca 1/5 of the rows, right? Doing full table scan does not sound as a bad idea to me compared to building an index, doing index range scan and accessing table by rowids... – Petr Feb 24 '21 at 21:13
  • Time to build the index will be made up of the full scan of your table (dependent on the size in bytes and your storage), the time taken to sort 2.5 billion rows (dependent on your storage speeds again as you’ll need to spill to temp lots), the time taken to write a 2.5 billion row index and the redo for that (again storage speeds). If you’re creating the index offline on modern flash storage, this will probably take about 8 hours, on the worse side if you’re creating it online (more redo, slower reading from the table) on old spinning disks, you’re looking at more like 30 hours at least. – Andrew Sayer Feb 24 '21 at 21:39
  • As other commented, building an index will for sure take substantialy more time than a simple query using *full table scan*. Similar is valid for the query using index - you'll have to wait for *days* to query 400+M rows with index access. Simple give a try a full scan export possible with parallel option, be prepared for *snaphot too old* etc. and think about introducing a *partitioned schema*. Good luck! – Marmite Bomber Feb 24 '21 at 21:51

0 Answers0