Questions tagged [presto]

Presto is an open source distributed SQL query engine for running analytic queries against data sources of all sizes ranging from gigabytes to petabytes. The community version of Presto is now called Trino. Amazon serverless query service called Athena is using Presto under the hood.

What is Presto?

Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes.

Presto was designed and written from the ground up for interactive analytics and approaches the speed of commercial data warehouses while scaling to the size of organizations like Facebook.

What can it do?

Presto allows querying data where it lives, including Hive, HBase, relational databases or even proprietary data stores. A single Presto query can combine data from multiple sources, allowing for analytics across your entire organization.

Presto is targeted at analysts who expect response times ranging from sub-second to minutes. Presto breaks the false choice between having fast analytics using an expensive commercial solution or using a slow "free" solution that requires excessive hardware.

References

3114 questions
22
votes
6 answers

AWS Athena export array of structs to JSON

I've got an Athena table where some fields have a fairly complex nested format. The backing records in S3 are JSON. Along these lines (but we have several more levels of nesting): CREATE EXTERNAL TABLE IF NOT EXISTS test ( timestamp double, …
Chelsar
  • 373
  • 2
  • 8
21
votes
4 answers

Presto array contains an element that likes some pattern

For example, one column in my table is an array, I want to check if that column contains an element that contains substring "denied" (so elements like "denied at 12:00 pm", "denied by admin" will all count, I believe I will have to use "like" to…
daydayup
  • 2,049
  • 5
  • 22
  • 47
21
votes
3 answers

Does Presto have the equivalent of Hive's SET command

It's very convenient to be able to set script variables. For example, SET start_date = 20151201; SELECT * FROM some_table where date = {$hiveconf:start_date}; Does Presto have this capability?
Kevin Morris
  • 213
  • 1
  • 2
  • 4
20
votes
2 answers

AWS Glue issue with double quote and commas

I have this CSV file: reference,address V7T452F4H9,"12410 W 62TH ST, AA D" The following options are being used in the table definition ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( 'quoteChar'='\"', …
ln9187
  • 730
  • 1
  • 7
  • 23
19
votes
4 answers

How to extract keys in a nested json array object in Presto?

I'm using the latest(0.117) Presto and trying to execute CROSS JOIN UNNEST with complex JSON array like this. [{"id": 1, "value":"xxx"}, {"id":2, "value":"yy"}, ...] To do that, first I tried to make an ARRAY with the values of id by SELECT…
k-kawa
  • 1,289
  • 2
  • 11
  • 18
18
votes
1 answer

What does MSCK REPAIR TABLE do behind the scenes and why it's so slow?

I know that MSCK REPAIR TABLE updates the metastore with the current partitions of an external table. To do that, you only need to do ls on the root folder of the table (given the table is partitioned by only one column), and get all its partitions,…
gdoron
  • 147,333
  • 58
  • 291
  • 367
17
votes
2 answers

Checking if key exists in Presto value map

I am new to Presto, and can't quite figure out how to check if a key is present in a map. When I run a SELECT query, this error message is returned: Key not present in map: element SELECT value_map['element'] FROM mytable WHERE name =…
iskandarblue
  • 7,208
  • 15
  • 60
  • 130
17
votes
1 answer

Amazon Athena: Convert bigint timestamp to readable timestamp

I am using Athena to query the date stored in a bigInt format. I want to convert it to a friendly timestamp. I have tried: from_unixtime(timestamp DIV 1000) AS readableDate And to_timestamp((timestamp::bigInt)/1000, 'MM/DD/YYYY HH24:MI:SS') at time…
noobeerp
  • 417
  • 2
  • 6
  • 11
17
votes
4 answers

Setup Standalone Hive Metastore Service For Presto and AWS S3

I'm working in an environment where I have an S3 service being used as a data lake, but not AWS Athena. I'm trying to setup Presto to be able to query the data in S3 and I know I need the define the data structure as Hive tables through the Hive…
mhaken
  • 1,075
  • 4
  • 14
  • 28
17
votes
1 answer

substr at Amazon Athena

At Amazon Athena, I want to extract only the character string "2017-07-27" from the character string "2017-07-27 12:10:08". SELECT SUBSTRING (event_datetime.s, 0, 10) FROM production limit 10 I tried it like this which only returns numbers 0 to…
Akihiro Seki
  • 321
  • 2
  • 3
  • 7
17
votes
2 answers

Checking if a map or array is empty in Presto?

How do I check if a map has no keys in Presto? If I have a way to check if an array is empty, I can use the map_keys function to determine if the map is empty.
Leo Jiang
  • 24,497
  • 49
  • 154
  • 284
17
votes
2 answers

AWS Athena and date_format

I have some issue while formatting a timestamp with Amazon Athena service. select date_format(current_timestamp, 'y') Returns just 'y' (the string). The only way I found to format dates in Amazon Athena is trough CONCAT + YEAR + MONTH + DAY…
emmekappa
  • 792
  • 1
  • 6
  • 9
17
votes
3 answers

How to export result of select statement in prestodb.io

Two things. How can I execute an sql statement without "presto-cli-0.56-executable.jar", for example I want to provide a web interface where people could write query and see the output on web How to export the result of the select statement into a…
user3171954
  • 181
  • 1
  • 1
  • 4
16
votes
1 answer

Athena DateDiff

I am trying to do what I think is a simple date diff function but for some reason, my unit value is being read as a column ("dd") so I keep getting a column cannot be resolved error I am using AWS Athena My code is this SELECT "reservations"."id"…
JoeD
  • 223
  • 1
  • 3
  • 5
16
votes
2 answers

Turn Presto columns to rows via rotation

This is the desired input and desired output. I'm unfamiliar with the terms used in SQL or Presto and the documentation seems to point to using map_agg but I think the problem here is dynamically creating columns but was curious if this is possible…
dalanmiller
  • 3,467
  • 5
  • 31
  • 38