Questions tagged [presto]

Presto is an open source distributed SQL query engine for running analytic queries against data sources of all sizes ranging from gigabytes to petabytes. The community version of Presto is now called Trino. Amazon serverless query service called Athena is using Presto under the hood.

What is Presto?

Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes.

Presto was designed and written from the ground up for interactive analytics and approaches the speed of commercial data warehouses while scaling to the size of organizations like Facebook.

What can it do?

Presto allows querying data where it lives, including Hive, HBase, relational databases or even proprietary data stores. A single Presto query can combine data from multiple sources, allowing for analytics across your entire organization.

Presto is targeted at analysts who expect response times ranging from sub-second to minutes. Presto breaks the false choice between having fast analytics using an expensive commercial solution or using a slow "free" solution that requires excessive hardware.

References

3114 questions
7
votes
1 answer

Can you add more than one partition in one "ALTER TABLE" command?

I'm using Amazon Athena to query through some log files stored in an S3 bucket, and am using partitions to section off days of the year for the files I need to query. I was wondering -- since I have a large batch of days to add to my table, could I…
codeadventurer
  • 611
  • 2
  • 8
  • 17
7
votes
1 answer

AWS Athena flattened data from nested JSON source

I'd like to create a table from a nested JSON in Athena. The solutions described here using tools like hive Openx-JsonSerDe attempt to mirror the JSON data in the SQL statement. I just want to get a few fields from the JSON file and create the…
ebnius
  • 910
  • 2
  • 8
  • 14
7
votes
1 answer

redis presto connector corrupt key when redis.key-prefix-schema-table=true for json dataFormat

I am trying to setup a working example of presto and redis on my local machine according to the (limited) presto-redis documentation. Summary of Problem: When using redis.key-prefix-schema-table=true and prefixing a redis key with dev:simple_table:…
aaroncarsonart
  • 1,054
  • 11
  • 27
7
votes
1 answer

Presto on Amazon S3

I'm trying to use Presto on Amazon S3 bucket, but haven't found much related information on the Internet. I've installed Presto on a micro instance but I'm not able to figure out how I could connect to S3. There is a bucket and there are files in…
Codex
  • 569
  • 2
  • 6
  • 22
6
votes
1 answer

Convert array(double) to varchar in Presto

I'm trying to convert Array(double) to varchar in Presto. A sample value: [99.0,98.0,99.0,95.0,99.0,88.0,90.0,79.0,90.0,56.0,90.0,90.0,92.0,90.0,93.0,99.0] I tried the cast function below: cast(colname as varchar) But got this error message: "Cannot…
ali60vip
  • 370
  • 1
  • 5
  • 13
6
votes
3 answers

Extract values from a JSON Array in Presto

I have a column with JSON arrays like below: {data=[{"name":"col1","min":0,"max":32,"avg":29}, {"name":"col2","min":1,"max":35,"avg":21}, {"name":"col3","min":4,"max":56,"avg":34}]} I'm trying to parse the array and extract specific values based on…
ali60vip
  • 370
  • 1
  • 5
  • 13
6
votes
1 answer

Removing exact duplicate rows from presto

With the following table (assuming it has many other rows and columns), how could I query it while removing duplicates? order_id customer_name amount bill_type 1 Chris 10 sale 1 Chris 1 tip 1 Chris 10 sale Note that while all 3 rows…
6
votes
2 answers

AWS Athena: Filter only numeric entries on a column

I am trying to make a query on AWS Athena, where I want to filter only numeric entries from a varchar column. However, Athena does not support ISNUMERIC function. I saw some functions that would be useful, but they are available only for Amazon…
6
votes
1 answer

NOT IN implementation of Presto v.s Spark SQL

I got a very simple query which shows significant performance difference when running on Spark SQL and Presto (3 hrs v.s 3 mins) in the same hardware. SELECT field FROM test1 WHERE field NOT IN (SELECT field FROM test2) After some research of…
Bostonian
  • 615
  • 7
  • 16
6
votes
1 answer

Querying struct fields from AWS Athena/Presto

I'll make a simplified example for this site, but basically I'm trying to write an Athena query (of data loaded by Glue crawler with intent to use in Quicksight) which will allow me to expand a struct inside of a select statement. In my example,…
Larry Anderson
  • 563
  • 6
  • 15
6
votes
2 answers

Getting day of week from date column in prestosql?

I have a date column called day such as 2019/07/22 if I want to create a custom field that translates that date to the actual day of week it is such as Sunday or Monday how is this possible? I cant seem to find a method that works for presto sql.…
Chris90
  • 1,868
  • 5
  • 20
  • 42
6
votes
1 answer

INVALID_FUNCTION_ARGUMENT: Array subscript out of bounds

I'm querying a column with a variable length JSON array. select col.pages[1].name, col.pages[2].name from assoc I get this error when there is only one value in the array. INVALID_FUNCTION_ARGUMENT: Array subscript out of bounds How do I prevent…
wipphd
  • 111
  • 2
  • 4
6
votes
1 answer

How to group by X minute increments in Presto SQL?

I have data set that looks like the following : Name, Timestamp, Period, Value Apple, 2012-03-22 00:00:00.000, 10, 34 Apple, 2012-03-22 00:06:00.000, 10, 23 Orange, 2012-03-22 00:00:00.000, 5, 3 Orange, 2012-03-22 00:08:00.000, 5, 45 Where the…
TESTER1
  • 67
  • 1
  • 4
6
votes
3 answers

How to convert a presto query output to a python data frame

I want to convert my query output to a python data frame to draw Line graph import prestodb import pandas as pd conn=prestodb.dbapi.connect( host='10.0.0.101', port=8081, user='hive', catalog='hive', schema='ong', ) cur =…
Naveen Vinayak
  • 61
  • 1
  • 1
  • 2
6
votes
4 answers

AWS Athena: Delete partitions between date range

I have an athena table with partition based on date like this: 20190218 I want to delete all the partitions that are created last year. I tried the below query, but it didnt work. ALTER TABLE tblname DROP PARTITION (partition1 < '20181231'); ALTER…
sakthi srinivas
  • 182
  • 1
  • 4
  • 12