1

I've read that using a where exists clause could usually be more efficient than writing select distinct. How could I rewrite the below 2 queries using the where exists condition? Not sure if query2 is eligible for this clause or this only applies to joins.

Query 1:

SELECT DISTINCT
    e.field1,
    regexp_substr(substr(TRIM(d.field2), 1, 2), '[A-Za-z]+', 1, 1) postal_group
FROM
    table1   e
    JOIN table1   f ON f.field0 = e.field0
    JOIN table2   g ON g.field3 = f.field3
    JOIN table3   g ON g.field4 = f.field4
    JOIN table4   a ON a.field5 = g.field5
    JOIN table5   b ON ( b.field6 = a.field6
                       AND b.field7 = a.field7 )
    JOIN table6   c ON ( c.field8 = b.field8
                       AND c.field9 = b.field9 )
    JOIN table7   d ON ( d.field10 = c.field10
                       AND d.field11 = c.field11 )

Query 2:

SELECT DISTINCT
    field
FROM
    table1
WHERE
    condition = 'value'
  • 1
    I suppose the 'exists vs distinct' advice refers to this kind of scenario https://stackoverflow.com/a/43321107/230471 where you only want rows from `e` and the other tables are only present as an existence check. By the way, tables have columns not fields, and there are no brackets in the `join` syntax. – William Robertson Apr 23 '21 at 19:44
  • Thanks for your answer. I think I understand the concept for 1 field in a 2 table join, but I'm struggling to apply it for my first query. So far, the answer provided by Kazi Mohammad Ali Nur is throwing an error because it does not recognize field2 in table 2. –  Apr 24 '21 at 11:14

2 Answers2

1

For your second query you can use group by clause to avoid DISTINCT.

SELECT field
FROM
    table1
WHERE
    condition = 'value'
Group by field

Please try below query with where exists instead of Distinct:

SELECT 
    e.field1,
    regexp_substr(substr(TRIM(d.field2), 1, 2), '[A-Za-z]+', 1, 1) postal_group
FROM
    table1   e
    where exists 
    (
        select 1 from table1   f 
        JOIN table2   g ON g.field3 = f.field3
        JOIN table3   g ON g.field4 = f.field4
        JOIN table4   a ON a.field5 = g.field5
        JOIN table5   b ON ( b.field6 = a.field6
                           AND b.field7 = a.field7 )
        JOIN table6   c ON ( c.field8 = b.field8
                           AND c.field9 = b.field9 )
        JOIN table7   d ON ( d.field10 = c.field10
                           AND d.field11 = c.field11 )
        where f.field0 = e.field0
    )
  • 1
    Yeah, sorry, I had grouped by to avoid the distincts, that's the question amended to show the distinct for query1. As for query 2, I was asking specifically about the where exists clause and whether it could be rewritten with it. However, is grouping by there more efficient than selecting distincts? –  Apr 23 '21 at 17:12
  • 2
    Actually you need to see the execution plan of your query with both group by and distinct to know for sure. – Kazi Mohammad Ali Nur Romel Apr 23 '21 at 17:15
  • I've just tried the approach for query2 and but does not recognize field2 from table7 (d). What could be happening? –  Apr 24 '21 at 08:39
  • 1
    There is no `d` for it to refer to, only `e`. – William Robertson Apr 24 '21 at 12:37
  • I would want the distinct of columns e.field1 and d.field2 –  Apr 24 '21 at 12:53
0

You can't avoid a join with d because you need a column from it. Therefore the best you can do is:

select e.column1
     , d.column2 as postal_group
from   table1 e
       join table7 d
            on  d.column10 = c.column10
            and d.column11 = c.column11
where  exists
       ( select 1 from table1 f
                join table2 g on g.column3 = f.column3
                join table3 h on h.column4 = f.column4
                join table4 a on a.column5 = h.column5
                join table5 b on b.column6 = a.column6 and b.column7 = a.column7
                join table6 c on c.column8 = b.column8 and c.column9 = b.column9
         where  f.column0 = e.column0 );

You only need distinct if table7.column2 can have duplicate values for a (column10, column11) combination.

This may or may not be more efficient than the original version - compare timings, execution plans, reads etc.

(I've renamed 'fields' as columns because that's what tables have. Also there are no brackets in a join clause and adding them tends to confuse code formatters.)

William Robertson
  • 15,273
  • 4
  • 38
  • 44
  • This is what I was looking for, thanks! However, I've just checked and seems less efficient that the original one, so I guess the where exists is only more efficiente for columns from a single table. –  Apr 26 '21 at 06:14