What is the maximum number of scalars that can be used in an IN clause?

Question

Whenever I use a small statement for example:

DELETE FROM c_ordertax WHERE (c_order_id,c_tax_id) IN ((183691598,1000862),(183691198,1000862));

It executes perfectly... but if I execute a lengthy statement to delete say 18755 records with these scalar values, it says "max_stack_depth" exceeded... this option in postgresql.conf has been set to 2MB and the query that threw the error doesn't even amount to 2MB, its just 300kb

Note: No Triggers are attached in the table

And one thing I noticed about other queries is, when I use single value in IN clause eg: DELETE FROM c_ordertax WHERE (c_order_id) IN ((183691598),(183691198)); they dont have any issues and however lengthy the query maybe, it executes perfectly...

My current options are:

I could increase the "max_stack_depth" value but it is limited to 8MB and increasing it further causes issues and postgresql server couldn't restart... it can only restart properly of the option is set to a value less than 8MB
I could Split up those statements but it might not be a graceful solution and that too requires me to know the maximum scalar values that can be accommodated in a single statement and if number of fields increase in scalar value, the total number of values that can be used in a single statement could reduce I fear...

So My Question is what is the maximum number of scalar values that can be used in an IN clause... if the number of fields in scalar value increases, is there a formula that can be used to determine the maximum number of scalar values that can be used eg:

5 values with 2 fields => ((1,2),(1,2),(1,2),(1,2),(1,2))
2 values with 3 fields => ((1,2,3),(1,2,3))

Any Database Mastermind encountered these kinda issues? If so How do I tackle it?

@a_horse_with_no_name, lol, that's what I suggested my clients but it seems they're afraid that the current ERP system might not work and a lot of time needs to be invested to rewrite code, sql, structure and architecture... might want to brush up my convincing salesman skills XD — Crystal Paladin, Apr 05 '19 at 10:55
You can have a look here: https://why-upgrade.depesz.com/show?from=9.3.25&to=11.2 to see which (security!) bugfixes you are missing because of not upgrading. Maybe that will convince your client ;) — , Apr 05 '19 at 11:21

score 2 · Accepted Answer · answered Apr 05 '19 at 10:34

2

It should work if you rewrite the list of scalar values to a values() list:

DELETE FROM c_ordertax 
using (
  values 
      (183691598,1000862),
      (183691198,1000862)
) as t(ord_id,tax_id)
WHERE c_order_id = t.ord_id
  and c_tax_id = t.tax_id;

I tried this with 10000 pairs in the values list and it did not throw an error. That was with Postgres 11 however. I don't have 9.3 available right now.

answered Apr 05 '19 at 10:34

This is working!!!! XD you're really a Database Mastermind... :p somehow the db engine accepts the values set in USING block and store it in memory and process it as a table but when used in WHERE IN clause as a composite value pair, it rejects the statement... I often wonder how you DB specialist figure things out so quickly... so cool of you guys... keep it up... and a cookie for you XD – Crystal Paladin Apr 05 '19 at 11:12
@CrystalPaladin: see Laurenz' answer for more details on why your query fails, and this doesn't – Apr 05 '19 at 11:13

score 1 · Answer 2 · answered Apr 05 '19 at 11:02

The problem is that IN lists of pairs get transformed like this during the parse stage:

EXPLAIN DELETE FROM large WHERE (id, id) IN ((1, 1), (2, 2), (3, 3), (4, 4), (5, 5));
                                                                      QUERY PLAN                                                                       
-------------------------------------------------------------------------------------------------------------------------------------------------------
 Delete on large  (cost=0.00..39425.00 rows=1 width=6)
   ->  Seq Scan on large  (cost=0.00..39425.00 rows=1 width=6)
         Filter: (((id = 1) AND (id = 1)) OR ((id = 2) AND (id = 2)) OR ((id = 3) AND (id = 3)) OR ((id = 4) AND (id = 4)) OR ((id = 5) AND (id = 5)))
(3 rows)

If the list consists of scalars, PostgreSQL can do better:

EXPLAIN DELETE FROM large WHERE id IN (1, 2, 3, 4, 5);
                          QUERY PLAN                           
---------------------------------------------------------------
 Delete on large  (cost=0.00..20675.00 rows=5 width=6)
   ->  Seq Scan on large  (cost=0.00..20675.00 rows=5 width=6)
         Filter: (id = ANY ('{1,2,3,4,5}'::integer[]))
(3 rows)

The second version will run with large lists, but the first will run into the limit during a recursive parse procedure.

I am not sure if that can be improved, but it may probably not be seen as a case worth spending a lot of effort on. You can always rewrite your query like "a_horse_with_no_name" suggested.

Usually, if you have long IN lists like that, you are probably doing something wrong, like trying to perform a join outside the database.

good to know... I'll keep this in mind... I won't consider using IN clause if conditional values are composite... thx for explaining in detail — Crystal Paladin, Apr 05 '19 at 11:16

What is the maximum number of scalars that can be used in an IN clause?

2 Answers2