I maintain a table in Oracle that contains several hundred thousand lines of code, including a priority column, which indicates for each line its importance according to the needs of the system.
ID | BRAND | COLOR | VALUE | SIZE | PRIORITY | EFFECTIVE_DATE_FROM | EFFECTIVE_DATE_FROM |
---|---|---|---|---|---|---|---|
1 | BL | BLUE | 58345 | 12 | 1 | 10/07/2022 | NULL |
2 | TK | BLACK | 4455 | 1 | 1 | 10/07/2022 | NULL |
3 | TK | RED | 16358 | 88 | 2 | 11/01/2022 | NULL |
4 | WRA | RED | 98 | 10 | 6 | 18/07/2022 | NULL |
5 | BL | BLUE | 20942 | 18 | 7 | 02/06/2022 | NULL |
At any given moment thousands more rows may enter the table, and it is necessary to SELECT from it the 1000 rows with the highest priority.
Although the naive solution is to SELECT using ORDER BY PRIORITY ASC, we find that the process takes a long time when the table contains a very large amount of rows (say over 2 million records).
The solutions proposed so far are to divide the table into 2 different tables, so that in advance the records with priority 1 will be entered into Table A, and the rest of the records will be entered in Table B, and then we will SELECT using UNION between the two tables. This way we save the ORDER BY process since Table A always contains priority 1, and it remains to sort only the data in Table B, which is expected to be smaller than the overall large table.
On the other hand, it was also suggested to leave the large table in place, and perform the SELECT using PARTITION BY on the priority column.
I searched the web for differences in speed or efficiency between the two options but did not find one, and I am debating how to do it. So which of the options is preferable if we focus on the efficiency and complexity of time?