Cost of WHERE filter statements in PrestoSQL

Question

Between the following two queries

  SELECT 
  CASE WHEN col1=1  THEN "one"
       WHEN col1=2  THEN "two"
       ELSE "whatever" 
       END AS numb from TABLE1 WHERE col1=1 or col1=2


  SELECT 
  CASE WHEN col1=1  THEN "one"
       WHEN col1=2  THEN "two"
       ELSE "whatever" 
       END AS numb from TABLE1

Are there any significant differences between two in terms of time complexity? I know the output table for the first table will be smaller due to the condition under WHERE filter, but would it also take more time due to the WHERE filter?

I wanna know how much time WHERE filter would consume.

Any help would be greatly appreciated.

@tinazmu could you elaborate? how is TABLE1 contains a lot of columns vs. TABLE1 contains col1 and just a few others different? — user98235, Mar 08 '23 at 03:25
It is more about the distribution of values across rows. If the table has millions of rows with only a few rows with col1=1 or col1=2 then the first version would work better (CPU effort to check for the condition pays off because you don't have to retrieve unwanted/irrelevant rows), if only a few rows have not (col1=1 or col1=2) then the other version works better (ie the effort to eliminate non-matching rows has little return). It also depends on if you have an index on Col1. — tinazmu, Mar 08 '23 at 03:35
If there is an index on col1 then the first query will be significantly faster. If there isn’t then both would require a full table scan and would take approximately the same amount of time — NickW, Mar 08 '23 at 07:51

score 0 · Accepted Answer · answered Mar 08 '23 at 15:35

Are there any significant differences between two in terms of time complexity?

Both are O(n) in terms of time complexity - they perform constant amount of work per row (since Presto/Trino is not actually a database and does not maintain/support indexes).

but would it also take more time due to the WHERE filter

Yes, but I would argue that in real life scenarios it should be quite insignificant compared to the rest of the processing (if the filter is "useless" - i.e. (almost)all the data should be present in the output though it potentially can be better to skip it of course).

I wanna know how much time WHERE filter would consume.

Use EXPLAIN ANALYZE on your data and query to see the cost in your case.

could you also take a look at https://stackoverflow.com/questions/75680237/creating-a-new-column-using-existing-columns-but-in-non-exclusive-conditions ? — user98235, Mar 09 '23 at 04:24
@user98235 done =) Also note that you have not specified SQL engine in it. — Guru Stron, Mar 09 '23 at 05:20

Cost of WHERE filter statements in PrestoSQL

1 Answers1