2

I have a (simplified) query like so:

SELECT DISTINCT this_.person_id AS y0_
  FROM coded_entity this_
 WHERE this_.code     = ?
   AND this_.tenant_id  = ?
   AND this_.person_id IN (SELECT person_id
                             FROM person that
                            WHERE that.date_voided IS NULL
                              AND that.ssn            = ?
                              AND that.tenant_id      = ? ) 

I would like to be able to force Oracle to always execute the innermost subquery first because I know it will always be more selective. Code only has a few values whereas the ssn has a far greater multitude. Each table will have the same number of rows.

However, in actual practice, sometimes the Oracle CBO decides to execute the coded_entity query first, causing much slower times.

Is there a way to force this behavior without splitting up the query into separate calls?

AHungerArtist
  • 9,332
  • 17
  • 73
  • 109
  • Have you thought about using the cardinality hint? – Ed Heal Jul 26 '17 at 15:45
  • 2
    Gather stats, Oracle is not so stupid, there must a be reason why the exec plan is not stable. Otherwise as a last resort you can use CARNALITY, MATERIALIZE or NO_PUSH_PRED hints – ibre5041 Jul 26 '17 at 15:52
  • Is ``person_id` column a primary key in person table ? – krokodilko Jul 26 '17 at 15:53
  • 2
    IMHO YOU should start with GATHER_PLAN_STATISTICS to deduce where optimized estimations are wrong. – ibre5041 Jul 26 '17 at 15:53
  • @krokodilko Yes, it is. – AHungerArtist Jul 26 '17 at 15:53
  • @ibre5041 I have recently run EXECUTE DBMS_STATS.GATHER_SCHEMA_STATS('USER',DBMS_STATS.AUTO_SAMPLE_SIZE); – AHungerArtist Jul 26 '17 at 15:56
  • Then add `UNION ALL select -99999 FROM dual` to the subquery, this will prevent the optimier from doing subquery unnesting optimalization, and the subquery will always be evaluated first. You can also use hints: ORDERED or LEADING. – krokodilko Jul 26 '17 at 16:06
  • @krokodilko I will try those suggestions and see how they work out. – AHungerArtist Jul 26 '17 at 16:08
  • your aliases for two different tables are identical (main query and subquery). Oracle can figure out scope (and we can). Would not hurt to make your example more readable (use different table aliases). – Patrick Bacon Jul 26 '17 at 16:18
  • With your aliases, it is not clear whether we have a correlated subquery or not. If you can meet your requirement without the correlated subquery, you probably would get better performance. It probably would not hurt to run a sql profile through the sql tuning advisor (integrated gui with SQL Developer though you might not be licensed). Perhaps the cbo will then determine that the subquery should be executed first. – Patrick Bacon Jul 26 '17 at 16:35
  • @PatrickBacon They're specifically not a correlated subquery because this query was a rewrite one of that would occasionally create an execution plan that created a cartesian merge (we would have millions of executions a day and then randomly it would decide to use a cartesian every few months). I will edit the naming to reflect this reality. – AHungerArtist Jul 26 '17 at 16:37
  • Why not just JOIN and then filter for DISTINCT records? Let the optimizer figure it out. – Shawn Jul 26 '17 at 18:21
  • 1
    BTW, jonathan lewis did a series on reading/analyzing execution plans and this particular entry, http://allthingsoracle.com/execution-plans-part-6-pushed-subqueries/, where he introduces the two query hints with the same intentions as you: /*+ no_unnest push_subq */ – Patrick Bacon Jul 26 '17 at 20:52

3 Answers3

1

My first thought (irrespective of keys) is to try this:

SELECT DISTINCT this_.person_id AS y0_
  FROM coded_entity this_
 WHERE this_.code     = ?
   AND this_.tenant_id  = ?
   AND EXISTS (SELECT null
                 FROM person THAT_
                WHERE THAT_.date_voided IS NULL
                  AND THAT_.ssn            = ?
                  AND THAT_.tenant_id      = this_.tenant_id
                  AND THAT_.person_id      = this_.person_id) 

But a better way (if person_id is the key of person and for every person there are zero or more coded entities) would be this:

SELECT this_.person_id AS y0_
  FROM person_ this_
 WHERE this_.ssn        = ?
   AND this_.tenant_id  = ?
   AND this_.date_voided is null
   AND EXISTS (SELECT null
                 FROM coded_entity THAT_
                WHERE THAT_.code = ?
                  AND THAT_.tenant_id      = this_.tenant_id
                  AND THAT_.person_id      = this_.person_id) 
Jeff Holt
  • 2,940
  • 3
  • 22
  • 29
0

No need for a sub-query. Just INNER JOIN and filter for unique records. Let the optimizer figure it out a bit better.

SELECT DISTINCT this_.person_id AS y0_
FROM coded_entity this_
INNER JOIN person that ON this_.person_id = that.person_id
    AND this_.tenant_id = that.tenant_id
    AND that.date_voided IS NULL
    AND that.ssn = ?
 WHERE this_.code     = ?
   AND this_.tenant_id  = ?
Shawn
  • 4,758
  • 1
  • 20
  • 29
  • 1
    See, that's actually the original query I had, which I am rewriting. And it ran great most of the time. However, every so often, a cartesian merge would result, causing massively long query times -- effectively breaking things. Our DBAs never really found a cause so this was my attempt to try and get away from things that might cause a join situation. – AHungerArtist Jul 26 '17 at 18:39
  • Interesting. They weren't able to tell you what data inputs were causing the long JOINs? I can't see anything there that would cause a weird JOIN. Bad data in the db, maybe? Or wonky index? – Shawn Jul 26 '17 at 18:42
  • Actually, if you're working with a lot of data, you may be able to shorten it by changing the JOIN's `AND this_.tenant_id = that.tenant_id` back to `AND that.tenant_id = ?`. – Shawn Jul 26 '17 at 18:47
  • Yeah, our DBAs were not very big on investigating... just a constant refrain of "rewrite the query!." It was frustrating. I think there's an issue with statistics/histograms (they have all histograms off) but I really don't have the knowledge to say. – AHungerArtist Jul 27 '17 at 15:43
0

You can always rewrite the script to have the condition apply first as a subselect and then on top of that apply the additional conditions, like this:

SELECT DISTINCT this_.person_id AS y0_
FROM (
SELECT *
  FROM coded_entity this_
 WHERE this_.person_id IN (SELECT person_id
                             FROM person that
                            WHERE that.date_voided IS NULL
                              AND that.ssn            = ?
                              AND that.tenant_id      = ? ) 
) sub
where this_.code     = ?
   AND this_.tenant_id  = ?

Or you can rewrite it in the join format:

SELECT DISTINCT this_.person_id AS y0_
  FROM coded_entity this_
  JOIN person that
    ON that.person_id = this_person_id
    AND that.date_voided IS NULL
    AND that.ssn            = ?
    AND that.tenant_id      = ?
 WHERE this_.code     = ?
   AND this_.tenant_id  = ?

Or what I think will definitely work is to just have properly gathered statistics on both tables and then the CBO will decide correctly.

Matúš Nagy
  • 1
  • 1
  • 1