Questions tagged [presto]

Presto is an open source distributed SQL query engine for running analytic queries against data sources of all sizes ranging from gigabytes to petabytes. The community version of Presto is now called Trino. Amazon serverless query service called Athena is using Presto under the hood.

What is Presto?

Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes.

Presto was designed and written from the ground up for interactive analytics and approaches the speed of commercial data warehouses while scaling to the size of organizations like Facebook.

What can it do?

Presto allows querying data where it lives, including Hive, HBase, relational databases or even proprietary data stores. A single Presto query can combine data from multiple sources, allowing for analytics across your entire organization.

Presto is targeted at analysts who expect response times ranging from sub-second to minutes. Presto breaks the false choice between having fast analytics using an expensive commercial solution or using a slow "free" solution that requires excessive hardware.

References

3114 questions
10
votes
1 answer

How S3 select pricing works? What is data returned and scanned in s3 select means

I have a 1M rows of CSV data. select 10 rows, Will I be billed for 10 rows. What is data returned and data scanned means in S3 Select? There is less documentation on these terms of S3 select
10
votes
1 answer

Presto map(varchar,varchar) : How to get all the possible keys for it?

I am trying to search a column having the data type map(varchar,varchar). Now one way to access the column is to use this structure, name_of_column['key'], which will give the value for that key. But I want to know what are possible keys and then…
arman
  • 414
  • 2
  • 6
  • 19
9
votes
3 answers

AWS Athena ALIAS in Group By does not get resolved

I have a very basic group by query in Athena where I would like to use an alias. One can make the example work by putting the same reference in the group by, but that's not really handy when there's complex column modifications going on and logic…
supernova
  • 1,762
  • 1
  • 14
  • 31
9
votes
3 answers

How to extract month name on a string datatype on athena

SELECT sales_invoice_date, MONTH( DATE_TRUNC('month', CASE WHEN TRIM(sales_invoice_date) = '' THEN DATE('1999-12-31') ELSE …
Ray
  • 133
  • 1
  • 1
  • 7
9
votes
1 answer

How call date_trunc function in amazon Athena?

I am trying to select the date_trunc value: select date_trunc(HOUR, current_date - interval '1' hour); OR select date_trunc(HOUR, current_date); And got error: [42703] ERROR: column "hour" does not exist Позиция: 19
Cherry
  • 31,309
  • 66
  • 224
  • 364
9
votes
1 answer

Condensing arrays in Presto

I have a query that produces strings of arrays using they array_agg() function SELECT array_agg(message) as sequence from mytable group by id which produces a table that looks like this: sequence 1 foo foo bar baz bar baz 2 …
iskandarblue
  • 7,208
  • 15
  • 60
  • 130
9
votes
0 answers

Reusing subqueries in AWS Athena generate large amount of data scanned

On AWS Athena, I am trying to reuse computed data using a WITH clause, e.g. WITH temp_table AS (...) SELECT ... FROM temp_table t0, temp_table t1, temp_table t2 WHERE ... If the query is fast, the "Data scanned" goes through the roof. As if…
user7094
  • 215
  • 2
  • 7
9
votes
2 answers

hive and presto,Integer division truncation problem

Why does the splitting of the two bigint type data in hive does not occur for integer division truncation, but occurs in presto
loonglee
  • 93
  • 1
  • 1
  • 3
9
votes
1 answer

Presto SQL pivoting (for lack of a better word) data

I am working with some course data in a Presto database. The data in the table looks like: student_id period score completed 1 2016_Q1 3 Y 1 2016_Q3 4 Y 3 2017_Q1 4 Y 4 2018_Q1 2 N I…
aguadamuz
  • 321
  • 1
  • 3
  • 12
9
votes
1 answer

Presto performance tuning, queries are much slower when performed in parallel

I have a presto cluster configured with 12 workers that is being queried by Java applications. The cluster is capable of performing 30 concurrent requests (if there are more, they are queued). The applications might send around 80-100 distinct…
Sasha Shpota
  • 9,436
  • 14
  • 75
  • 148
9
votes
2 answers

Select rows by index in Amazon Athena

This is a very simple question but I can't seem to find documentation on it. How one would query rows by index (ie select the 10th through 20th row in a table)? I know there's a row_numbers function but it doesn't seem to do what I want.
Ajjit Narayanan
  • 632
  • 2
  • 8
  • 18
9
votes
4 answers

How to convert a date format YYYY-MM-DD into integer YYYYMMDD in Presto/Hive?

How to CONVERT a date in format YYYY-MM-DD into integer YYYYMMDD in Presto/Hive? I am trying to convert the below list into YYYYMMDD integers WITH all_dates as (SELECT CAST(date_column AS DATE) date_column FROM (VALUES …
Chris
  • 767
  • 1
  • 8
  • 23
9
votes
1 answer

OFFSET on AWS Athena

I would like to run a query on AWS Athena with both a LIMIT and an OFFSET clause. I take it the former is supported while the latter is not. Is there any way of emulating this functionality using other methods?
RoyalTS
  • 9,545
  • 12
  • 60
  • 101
9
votes
3 answers

Issues with JSON_EXTRACT in Presto for keys containing ' ' character

I'm using Presto(0.163) to query data and am trying to extract fields from a json. I have a json like the one given below, which is present in the column 'style_attributes': "attributes": { "Brand Fit Name": "Regular Fit", "Fabric":…
Aaquib Khwaja
  • 544
  • 3
  • 7
  • 14
9
votes
2 answers

Duplicate results in an AWS Athena (Presto) DISTINCT SQL Query?

I have a bunch of files on S3 that contain just MD5s, one per line. I created an AWS Athena table to run a de-duplication query against the MD5s. In total there are hundreds of millions of MD5s in those files and in the table. Athena Table Creation…
T. Brian Jones
  • 13,002
  • 25
  • 78
  • 117