Questions tagged [presto]

Presto is an open source distributed SQL query engine for running analytic queries against data sources of all sizes ranging from gigabytes to petabytes. The community version of Presto is now called Trino. Amazon serverless query service called Athena is using Presto under the hood.

What is Presto?

Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes.

Presto was designed and written from the ground up for interactive analytics and approaches the speed of commercial data warehouses while scaling to the size of organizations like Facebook.

What can it do?

Presto allows querying data where it lives, including Hive, HBase, relational databases or even proprietary data stores. A single Presto query can combine data from multiple sources, allowing for analytics across your entire organization.

Presto is targeted at analysts who expect response times ranging from sub-second to minutes. Presto breaks the false choice between having fast analytics using an expensive commercial solution or using a slow "free" solution that requires excessive hardware.

References

3114 questions
5
votes
1 answer

What is the equivalent of Presto UNNEST function in Hive

Presto has an UNNEST function to explode columns made of arrays. Is there a similar one for Hive? See docs for UNNEST function of Presto here.
ishan3243
  • 1,870
  • 4
  • 30
  • 49
5
votes
1 answer

what are the downsides of using Presto in ETL scenarios?

I have read that Presto is meant for ad-hoc querying and Hive/spark are more for apt for ETL scenarios. It seems the reason not to use Presto in ETL is because Presto queries can fail and there is no mid-query fault tolerance. However, it looks…
user2715182
  • 653
  • 2
  • 10
  • 23
5
votes
2 answers

How to query NaN double values in Athena

I need to query something like in AWS Athena SELECT * FROM "hl"."may" where fqk = 'NaN' limit 10
jk1
  • 593
  • 6
  • 16
5
votes
1 answer

Select first day of month in presto

I'm trying to select the first day of the month for a date value in Presto (Hive Table). I've tried TRUNC(date,'MM') which works in Hive but not Presto.
ericbrownaustin
  • 1,230
  • 4
  • 18
  • 36
5
votes
0 answers

How to handle hive locking across hive and presto

I have a few hive tables that are insert-overwrite from spark and hive. Those tables are also accessed by analysts on presto. Naturally, we're running into some windows of time that users are hitting an incomplete data set because presto is ignoring…
5
votes
2 answers

presto sql filter last 24 hours

I'm trying to get a query that filters the date from the last 24 hours: select * from tb where created_at > DATEADD('hour', -24, now()) limit 100; But I'm getting this error: SYNTAX_ERROR: line 3:24: Function dateadd not registered
Filipe Ferminiano
  • 8,373
  • 25
  • 104
  • 174
5
votes
1 answer

Presto MD5 Hash for multiple Columns in DB Table

I am facing a challenge that's driving me crazy after two hours of trial and error... I need to hash at least two columns of a relational table with presto (actually with Amazon Athena which uses the presto engine). My current state is…
MConan
  • 171
  • 1
  • 2
  • 8
5
votes
1 answer

Presto Custom UDF

I've created a custom udf that is registered but when I try to select custom_udf(10) I get the following error: Exact implementation of BasicPlatform do not match expected java types Here is my udf, I can't seem to figure out what is wrong with…
Ace Haidrey
  • 1,198
  • 2
  • 14
  • 27
5
votes
3 answers

how the presto shows partitions before presto execute hql?

I used pyhive to connect hive to use Presto. May I know the partitions of the hive tables before presto has executed the sql?
user3065606
  • 235
  • 1
  • 4
  • 13
5
votes
2 answers

How to Quickly Flatten a SQL Table

I'm using Presto. If I have a table like: ID CATEGORY VALUE 1 a ... 1 b 1 c 2 a 2 b 3 b 3 d 3 e 3 f How would you convert to the below without writing a case statement for each…
Moosa
  • 3,126
  • 5
  • 25
  • 45
5
votes
2 answers

Presto Interpreter in Zeppelin on EMR

Is it possible to add Presto interpreter to Zeppelin on AWS EMR 4.3 and if so, could someone please post the instructions? I have Presto-Sandbox and Zeppelin-Sandbox running on EMR.
shomi
  • 51
  • 1
  • 3
5
votes
1 answer

How to list all Presto workers?

I want to get a list of all connected workers so that I can detect which worker is not working. I tried select * from sys.node; but it doesn't work. I'm using Presto 0.128.
soulmachine
  • 3,917
  • 4
  • 46
  • 56
5
votes
2 answers

PrestoDB EMR Server refused connection

I have setup an EMR in AWS with PrestoDB installed on it, Earlier I was able to query with PrestoDB but somehow after a restart it stopped working and started giving following error "Error running command: Server refused connection:…
Nirdesh
  • 103
  • 7
5
votes
1 answer

What are the fundamental architectural, SQL compliance, and data use scenario differences between Presto and Impala?

Can some experts give some succinct answers to the differences between Presto and Impala from these perspectives? Fundamental architecture design SQL compliance Real-world latency Any SPOF or fault-tolerance functionality Structured and…
Yellow Duck
  • 261
  • 1
  • 4
  • 14
5
votes
2 answers

How much data do I need to have to make use of Presto?

How much data do I need to have to make use of Presto? The web site states that it can query data sizes from gigabytes to petabytes. I understand how it is used to query very large datasets, but is anyone using it for hundreds of gigabytes?
mpodrazik
  • 51
  • 3