0

I'm not an DBA expert, we have an existing Oracle query to extract data for a particular day , the problem we've is if the business volume for a day is extremly large, the query takes 8+ hours and timedout. We cannot do optimization inside the database itself, then how do we usually handle extreme case like this? I've pasted the query below with content masked to show the SQL structure, looking for advises on how to optimizae this query or any alternative way to avoid timeout.

WHENEVER SQLERROR EXIT 1
SET LINESIZE 9999
SET ECHO OFF
SET FEEDBACK OFF
SET PAGESIZE 0
SET HEADING OFF
SET TRIMSPOOL ON
SET COLSEP ","

SELECT co.cid
  || ',' || DECODE(co.cid,'xxxxx','xxxx',null,'N/A','xxxxx')
  || ',' || d.name
  || ',' || ti.rc
  || ',' || DECODE(cf.side_id,1,'x',2,'xx',5,'xx','')
  || ',' || cf.Quantity
  || ',' || cf.price
  || ',' || TO_CHAR(time,'YYYY-mm-dd hh24:mi:ss')
  || ',' || DECODE(co.capacity_id,1,'xxxx',2,'xxxx','')
  || ',' || co.type
  || ',' || cf.id
  || ',' || CASE
              WHEN (cf.account_id = xxx OR cf.account_id = xxx) THEN SUBSTR(cf.tag, 1, INSTR(cf.tag, '.')-1) || '_' || ti.ric || '_' || DECODE(cf.side_id,1,'xx',2,'xx',5,'xx','')
              WHEN INSTR(cf.clientorder_id, '#') > 0 THEN SUBSTR(cf.clientorder_id, 1, INSTR(cf.clientorder_id, '#')-1)
              ELSE cf.clientorder_id
            END
  || ',' || co.tag
  || ',' || t.description
  || ',' || CASE
              WHEN cf.id = xxx THEN 'xxxx'
              ELSE (SELECT t.name FROM taccount t WHERE t.account_id = cf.account_id)
            END as Account
FROM clientf cf, tins ti, thistory co, tdk d, tra t
WHERE cf.sessiondate = TO_DATE('xxxxxx','YYYYMMDD')
AND cf.orderhistory_id = co.orderhistory_id
AND cf.reporttype_id = 1
AND ti.inst_id = cf.inst_id
AND (ti.rc LIKE '%.xx' or ti.rc LIKE '%.xx' or ti.rc LIKE '%.xx' )
AND d.de_id = t.de_id
AND t.tr_id = co.tr_id
AND nvl(co.type_id,0) <> 3
AND cf.trid not in (SELECT v2.pid FROM port v2 WHERE v2.sessiondate = cf.sessiondate AND v2.exec_id = 4)
ORDER BY co.cid, time, cf.quantity;
enfany
  • 875
  • 2
  • 15
  • 33
  • 3
    "We cannot do optimization inside the database itself" ... hard to optimize a query if you can't do that. The only thing that sticks out at me is this line `AND (ti.rc LIKE '%.xx' or ti.rc LIKE '%.xx' or ti.rc LIKE '%.xx' )` since it obviously doesn't use an index but it's hard to tell whether it's ultimately going to matter. – David Faber Jun 02 '18 at 00:35
  • 3
    Did you run an explain plan? – OldProgrammer Jun 02 '18 at 00:42
  • You have a correlated subquery `..AND cf.trid not in (SELECT v2.pid FROM port v2 ..` this will be executed repeatedly. Try to remove this by using temp table on port (filter out exec_id) or try to include port in the join ssomehow. – Peeyush Jun 02 '18 at 01:52
  • 1
    Without an explain plan we are going to just be guessing. Show the explain plan please. – Sam M Jun 02 '18 at 03:22
  • +1 on requests for explain plan (or, better yet, see: https://stackoverflow.com/a/50605440/5174436 for how to use DBMS_XPLAN for more detail). In the meantime, is there an index on `CLIENTF(reportype_id,sessiondate)`? – Matthew McPeak Jun 02 '18 at 11:38
  • Is `port.pid` a mandatory (`not null`) column? The possibility of null values (even if there aren't any) limits what the optimiser can do with a `not in` subquery. By the way, *how do we usually handle extreme case like this?* is something only you can answer. – William Robertson Jun 02 '18 at 11:42

1 Answers1

1

I would firstly talk to the people who need the output of this query and ask them about the report and each individual column. Sometimes, some columns are not needed any more, sometimes the whole report. 8+ hours runtime is a good bargaining point ;-)

Next, I would put the original query to one side and start build an test query from scratch, bit by bit, for instance starting with clientf, taking all it's columns in the WHERE clause:

SELECT *
  FROM clientf SAMPLE (0.1) cf 
 WHERE cf.sessiondate = TO_DATE('xxxxxx','YYYYMMDD')
   AND cf.reporttype_id = 1;

If that's ok, I'd increase the sample size until 99%. If the runtime is already to long, you might suggest an index on clientf.sessiondate (or may be on clientf.reporttype_id, but that's unlikely helpful as it looks like to have too few distinct values).

Once that is done, I'd join the first table:

SELECT *
  FROM clientf SAMPLE (0.1) cf 
 WHERE cf.sessiondate = TO_DATE('xxxxxx','YYYYMMDD')
   AND cf.reporttype_id = 1
   AND cf.trid NOT IN (SELECT v2.pid 
                         FROM port v2 
                        WHERE v2.sessiondate = cf.sessiondate 
                          AND v2.exec_id = 4);

I'd compare NOT IN and WHERE NOT EXISTS, not expecting much differences.

Then I'd join the next table (prefering personally ANSI syntax), again starting with a small sample, again adding it's columns to the where clause:

SELECT *
  FROM clientf SAMPLE (0.1) cf 
  FROM thistory co 
    ON cf.orderhistory_id = co.orderhistory_id
 WHERE cf.sessiondate = TO_DATE('xxxxxx','YYYYMMDD')
   AND cf.reporttype_id = 1
   AND nvl(co.type_id,0) <> 3
   AND cf.trid NOT IN (SELECT v2.pid 
                         FROM port v2 
                        WHERE v2.sessiondate = cf.sessiondate 
                          AND v2.exec_id = 4);

I'd play around replacing nvl(co.type_id,0)<>3 with (co.type_id <>3 OR co.type_id IS NULL), monitoring carefully that the result is logically the same.

And so on...

wolφi
  • 8,091
  • 2
  • 35
  • 64