0

In TiDB, how can I execute the same SELECT statement for every element of a set and union the results?

E.g. suppose I have this query which takes given_id and returns <= 3 ids:

SELECT id 
FROM my_table 
WHERE id > given_id 
ORDER BY id 
LIMIT 3;

How do I execute this query for every given_id in a set of given_ids? In pseudocode, this would be like:

results = set()
for given_id in given_ids:
  results += run_sql("""
SELECT id 
FROM my_table 
WHERE id > %s
ORDER BY id 
LIMIT 3;
""" % given_id)

I'm aware of the UNION keyword, but IIUC to use it you need to construct an expression with size linear in the number of given_ids, which seems bad. Is this something which requires defining SQL functions?

EDIT: Thanks to Tim Biegeleisen for giving an answer that uses the particular logic of the example query to solve the problem. BUT, I'm wondering if there is a general way to execute a parameterized SELECT statement on a set of inputs and then union the result.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
emchristiansen
  • 3,550
  • 3
  • 26
  • 40
  • Add tag for RDBMS used. – June7 Feb 28 '23 at 03:28
  • When asking a SQL question, it is important to add a tag for the RDBMS you're using, as syntax and functionality varies between them. This is mentioned in the SQL tag description. Adding the proper RDBMS tag saves time for both you and the people who answer your question, especially when they waste it by writing an answer only to find out it won't work for your RDBMS. Please [edit] to add that tag. – Ken White Feb 28 '23 at 03:30
  • 1
    I'm using TiDB, which I think is mostly compliant with MySQL 8.0 (edited). – emchristiansen Feb 28 '23 at 04:27
  • _if there is a general way to execute a parameterized SELECT statement on a set of inputs and then union the result._ ... yes, you write a script, stored procedure or application to do exactly that. – Chris Schaller Feb 28 '23 at 04:58
  • It looks to me that you intended to do a prepared statement in your example, but that you actually didn't do that. The `run_sql(query % param)` will do the string interpolation before sending it to the database driver or server. The solution would be to do `run_sql(query, (param, ))`. The exact details depend on the database driver etc. This can help with security and performance. – Daniël van Eeden Feb 28 '23 at 06:49
  • Hey Daniel, to be clear that's just pseudocode to explain the semantics of what I'm trying to do. It's not the actual code that I'm using. – emchristiansen Feb 28 '23 at 17:35

2 Answers2

2

I'm wondering if there is a general way to execute a parameterized SELECT statement on a set of inputs and then union the result.

You cannot make a single query that runs a variable number of SELECT statements depending on the values it reads. The query must be fixed at the time it is prepared, which is before it starts reading data.

You could do it with two queries. The first to read the data values, then use the results to format a second query with as many unioned SELECT statements as the number of distinct values.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • Thanks Bill. Am I right in thinking it's a bad idea to generate a massive query with ~1000 distinct select statements, all unioned together? I'm concerned this may put too much load on the compiler and planner. But maybe this is just how you do it in SQL? – emchristiansen Feb 28 '23 at 17:33
  • More to the point, it's unnecessary, as Tim's answer shows. – Bill Karwin Feb 28 '23 at 17:38
  • I have no idea if TiDB has a limit on the number of UNIONed queries, or a practical limit given resource use. This depends on the implementation. You should test it before you rely on it. – Bill Karwin Feb 28 '23 at 17:40
  • Bill, Tim's answer only works for the select statement I gave as an example. My question asks if there is a general solution. – emchristiansen Feb 28 '23 at 17:53
  • No, there isn't a general solution to solve any arbitrary task in a single SQL query. – Bill Karwin Feb 28 '23 at 18:10
1

If your database support ROW_NUMBER(), then using it would be one option:

WITH cte AS (
    SELECT t.*, ROW_NUMBER() OVER (PARTITION BY given_id ORDER BY id) rn
    FROM my_table t
    WHERE id > given_id
)

SELECT id, given_id
FROM cte
WHERE rn <= 3
ORDER BY given_id, id;
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • Thanks, that seems to me like it would work (I'll test it tomorrow). But I'm surprised that the answer is this complicated, and that it depends on the particular logic of the select statement I described. Is there not a generic way to say "Run this select for these N values, and return the results"? – emchristiansen Feb 28 '23 at 04:37
  • Limits operate on a single query, `MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'` so you are actually asking for multiple queries, which you do by writing out the individual queries, or by using _window queries_ like this to create sub-sets and row numbering to turn limits into filters. – Chris Schaller Feb 28 '23 at 04:56
  • So this is actually a rather elegant and less complicated solution that others like dynamic SQL – Chris Schaller Feb 28 '23 at 04:56
  • Hey Tim, I'm not sure how you propose to introduce the `given_id`s in your example, and as written your statement doesn't quite work: https://www.db-fiddle.com/f/2T8hjfgTPf3LXdukC8cu2x/0 – emchristiansen Feb 28 '23 at 17:51
  • @emchristiansen Sorry, I had a slight problem in the CTE. – Tim Biegeleisen Feb 28 '23 at 23:42
  • @TimBiegeleisen No worries, and I'm very grateful for the help! I'm just trying to understand how you intend to introduce the `given_id`s. I.e. how do you partition Table A using values from Table B (currently trying to figure it out). – emchristiansen Mar 01 '23 at 00:10