8

I have a PostgreSQL function that returns a query result to pgadmin results grid REALLY FAST. Internally, this is a simple function that uses a dblink to connect to another database and does a query return so that I can simply run

SELECT * FROM get_customer_trans();

And it runs just like a basic table query.

The issue is when I use the NOT IN clause. So I want to run the following query, but it takes forever:

SELECT * FROM get_customer_trans()
WHERE user_email NOT IN 
    (SELECT do_not_email_address FROM do_not_email_tbl);

How can I speed this up? Anything faster than a NOT IN clause for this scenario?

Horse Voice
  • 8,138
  • 15
  • 69
  • 120

2 Answers2

21

get_customer_trans() is not a table - probably some stored procedure, so query is not really trivial. You'd need to look at what this stored procedure really does to understand why it might work slow.

However, regardless of stored procedure behavior, adding following index should help a lot:

CREATE INDEX do_not_email_tbl_idx1
    ON do_not_email_tbl(do_not_email_address);

This index lets NOT IN query to quickly return answer. However, NOT IN is known to have issues in older PostgreSQL versions - so make sure that you are running at least PostgreSQL 9.1 or later.

UPDATE. Try to change your query to:

SELECT t.*
FROM get_customer_trans() AS t
WHERE NOT EXISTS (
    SELECT 1
    FROM do_not_email_tbl
    WHERE do_not_email_address = t.user_email
    LIMIT 1
)

This query does not use NOT IN, and should work fast. I think that in PostgreSQL 9.2 this query should work as fast as one with NOT IN though.

mvp
  • 111,019
  • 13
  • 122
  • 148
  • Ive already have an index on that column in do_not_email_tbl. And note as I mentioned, the result returned by the function is super fast. `SELECT * from get_customer_trans()` returns quickly to the query results grid all the 2.2 million records. So it seems like the function is fast. I want to further filter the result by getting only those customers that are not in do_not_email_tbl. – Horse Voice Jun 08 '13 at 04:45
  • AWEESOMEEEE!!!. Thank you! you sure the logic is correct with the select 1? Im not familiar with NOT EXISTS. – Horse Voice Jun 08 '13 at 05:15
  • It does not really matter what you `SELECT` inside `NOT EXISTS`. It could be anything, like `*` or `0` or any column name, and `1` is simply easiest thing to write there. And `LIMIT 1` is there only to make sub-`SELECT` to stop searching at first match – mvp Jun 08 '13 at 05:18
  • 1
    @mvp My guess is that `not exists` is smart enough to imply `limit 1` – Clodoaldo Neto Jun 08 '13 at 10:29
  • 1
    @ClodoaldoNeto: indeed, the `LIMIT 1` is not needed. THE `(not) exists` subquery evaluates to exactly one boolean value; either one (or more) row(s) exist(s), or it/they don't/doesn't. In any case the subquery results in true or false. – wildplasser Jun 08 '13 at 10:48
  • Is there a way I can do this with joins? Using the exists and not exists is tricky. Can anyone elaborate on how that can be done? – Horse Voice Jun 12 '13 at 14:22
  • 1
    Just used "not exists" instead of "not in" on postgresql 12 and it made a huge difference. – Shadi Oct 03 '22 at 20:11
7

Just do it this way:

SELECT * FROM get_customer_trans() as t1 left join do_not_email_tbl as t2 
on user_email = do_not_email_address
where t2.do_not_email_address is null
Rocketq
  • 5,423
  • 23
  • 75
  • 126