There is a table with parquet data format of 20 GB and simple query will give results by scanning only 1GB of data.
select columns from table1 where id in (id1, id2, idn)
If same query is executed with a sub-query such as -
select columns from table1 where id in (select id from table2 limit n)
This query will give results by scanning 20GB, whole the table.Even n is very small number as 10, 50 or 5000.
Same happen with LEFT JOIN.
SELECT table1.* FROM
table2 LEFT JOIN table1
ON table2.id=table1.id
Is there a way to achieve this by running single query instead of fetch and save result of sub-query and pass as args into another query? Any best practices of How currently users runs LEFT JOIN or sub-query without full table scan on Athena ?
Similar questions- Question -1, Question -2