Questions tagged [presto]

Presto is an open source distributed SQL query engine for running analytic queries against data sources of all sizes ranging from gigabytes to petabytes. The community version of Presto is now called Trino. Amazon serverless query service called Athena is using Presto under the hood.

What is Presto?

Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes.

Presto was designed and written from the ground up for interactive analytics and approaches the speed of commercial data warehouses while scaling to the size of organizations like Facebook.

What can it do?

Presto allows querying data where it lives, including Hive, HBase, relational databases or even proprietary data stores. A single Presto query can combine data from multiple sources, allowing for analytics across your entire organization.

Presto is targeted at analysts who expect response times ranging from sub-second to minutes. Presto breaks the false choice between having fast analytics using an expensive commercial solution or using a slow "free" solution that requires excessive hardware.

References

3114 questions
16
votes
3 answers

Casting unix time to date in Presto

I have timestamps stored in time since epoch (ms) and I would like to query and display results using a date formatted like 'yyyy-mm-dd'.
deltap
  • 4,176
  • 7
  • 26
  • 35
16
votes
6 answers

Cumulative distinct count

I am working on query to get cumulative distinct count of uids on daily basis. Example : Say there are 2 uids (100,200) appeared on date 2016-11-01 and they also appeared on next day with new uid 300 (100,200,300) on 2016-11-02 At this point i…
user1495744
  • 199
  • 1
  • 1
  • 5
16
votes
1 answer

Select Top # in Presto SQL?

Does Presto SQL really lack TOP X functionality in SELECT statements? If so, is there a workaround in the meantime? https://prestodb.io/
Phillip Deneka
  • 221
  • 1
  • 3
  • 12
15
votes
1 answer

Difference between "ROWS BETWEEN" and "RANGE BETWEEN" in (Presto) window function "OVER" clause

This question is primarily about older versions of PrestoSQL, which have been resolved in the (now renamed) Trino project as of versions 346. However, Amazon's Athena project is based off of Presto versions 0.217 (Athena Engine 2) and 0.172 (Athena…
mbafford
  • 2,266
  • 1
  • 20
  • 25
15
votes
4 answers

Athena date format unable to convert string to date formate

tried the below syntax none of them helped to convert a string type column to date select INVC_,APIDT,APDDT from APAPP100 limit 10 select current_date, APIDT,APDDT from APAPP100 limit 10 select date_format( b.APIDT, '%Y-%m-%d') from APAPP100…
vinsent paramanantham
  • 953
  • 3
  • 15
  • 34
15
votes
1 answer

Finding out the data type of a value in Presto

I'm getting values from nested maps and it's hard to figure out what data type each value is. Is there a typeof function that can tell me the data type of each value?
Leo Jiang
  • 24,497
  • 49
  • 154
  • 284
14
votes
2 answers

AWS Athena - GENERIC_INTERNAL_ERROR: Number of partition values does not match number of filters

I'm querying a table in Athena that is giving the error: GENERIC_INTERNAL_ERROR: Number of partition values does not match number of filters I was able to query it earlier, but added another partition (AWS glue job) to try and optimize joins I will…
Neil Galloway
  • 141
  • 1
  • 1
  • 5
14
votes
4 answers

Converting bigint to timestamp in presto

I have a column in my dataset that has a datatype of bigint: Col1 Col2 1 1519778444938790 2 1520563808877450 3 1519880608427160 4 1520319586578960 5 1519999133096120 How do I convert Col2 to the following…
nak5120
  • 4,089
  • 4
  • 35
  • 94
14
votes
2 answers

Athena create table from parquet schema

Is there a way to create a table in Amazon Athena directly from parquet file based on avro schema? The schema is encoded into the file so its seems stupid that I need to actually create the DDL myself. I saw this and also another duplication but…
NetanelRabinowitz
  • 1,534
  • 2
  • 14
  • 26
14
votes
1 answer

Show tables from all schemas in Presto

In Presto SHOW SCHEMAS; returns all schemas SHOW TABLES FROM foo; returns all tables for foo schema Is there a simple way to return tables from all schemas in Presto?
Gregology
  • 1,625
  • 2
  • 18
  • 32
14
votes
3 answers

How to cross join unnest a JSON array in Presto

Given a table that contains a column of JSON like this: {"payload":[{"type":"b","value":"9"}, {"type":"a","value":"8"}]} {"payload":[{"type":"c","value":"7"}, {"type":"b","value":"3"}]} How can I write a Presto query to give me the average b value…
mark
  • 4,678
  • 7
  • 36
  • 46
13
votes
1 answer

How to compute a median in PrestoSQL?

It seems like there is no native function for that purpose in Presto SQL. Do you know any way to efficiently aggregate a group and return its median?
Roméo Després
  • 1,777
  • 2
  • 15
  • 30
13
votes
1 answer

AWS Athena partition fetch all paths

Recently, I've experienced an issue with AWS Athena when there is quite high number of partitions. The old version had a database and tables with only 1 partition level, say id=x. Let's take one table; for example, where we store payment parameters…
null
  • 1,944
  • 1
  • 14
  • 24
13
votes
2 answers

Combine rows into a list in PrestoSQL

Currently, my table has three different fields, id1, id2 and actions. action is of type string. For example, my table looks something like the table given below: id1 | id2 | actions --------------------------- "a1" "a2" …
Telepresence
  • 619
  • 2
  • 7
  • 22
13
votes
2 answers

Presto: Cast timestamp w/TZ to plain timestamp WITHOUT converting to UTC

This query in Presto: select *, cast(ts_w_tz as timestamp) as ts, cast(substr(cast(ts_w_tz as varchar), 1, 23) as timestamp) as local_ts_workaround from (select timestamp '2018-02-06 23:00:00.000 Australia/Melbourne' as ts_w_tz); Returns: …
EvilPuppetMaster
  • 8,072
  • 10
  • 34
  • 31