Presto has multi connectors. While the connectors do implement read and write operations, from all the tutorials I read, it seems they are typically used as data sources to read from only. For example, netflix has "10 petabyte" of data on Amazon S3 and they explicitly state that no disk (and no HDFS) is used on the Presto worker nodes. The stated use case are "ad hoc interactive" queries.
Also, Amazon Athena is essentially S3+Presto and comes with similar use cases.
I'm puzzled how this can work in practice. Obviously, you don't want to read 10 PB of data on every query. So I assume, you want to keep some previously fetched data in memory, such as a database index. However, with no constraints on the data and the queries, I fail to understand how this can be efficient.
Use case 1: I run the same query frequently, e.g. to show metric on a dashboard. Does Presto avoid rescanning the data points which are already 'known'?
Use case 2: I'm analysing a large data set. Each query is slightly different, however there are common subqueries or we filter to a common subset of the data. Does Presto learn from previous queries and carry over intermediate results?
Or, if this is not the case, would I be well advised to store intermediate results somewhere (e.g. CREATE TABLE AS ...)?