9

On AWS Athena, I am trying to reuse computed data using a WITH clause, e.g.

WITH temp_table AS (...)
SELECT ...
FROM temp_table t0, temp_table t1, temp_table t2
WHERE ...

If the query is fast, the "Data scanned" goes through the roof. As if temp_table is computed for each time it is reference in the FROM clause.

I don't see the issue if I create a temp table separately and use it multiple times in the query.

Is there a way to really reuse a subquery multiple times without any penalty?

Piotr Findeisen
  • 19,480
  • 2
  • 52
  • 82
user7094
  • 215
  • 2
  • 7
  • 5
    Unfortunately this is how inline views work in Presto. This is tracked as https://github.com/prestosql/presto/issues/10 – Piotr Findeisen Feb 21 '19 at 22:34
  • Thank you very much for the reply. I was hoping to avoid creating temp tables and perform the whole operation in one single SQL query. But it seems like Presto won't let me do it. – user7094 Feb 22 '19 at 18:40
  • 2
    Don't think of them as temp tables, that you create for a single query and then delete them. Athena is not a standard DB with a single node, indexes, and lots of RAM in that instance. It is a distributed compute engine that scans hundreds of files in various locations. Analyze your queries and design the paths of aggregation, partition and other optimization that you can reuse in your queries. – Guy Oct 09 '19 at 05:56

0 Answers0