Questions tagged [presto]

Presto is an open source distributed SQL query engine for running analytic queries against data sources of all sizes ranging from gigabytes to petabytes. The community version of Presto is now called Trino. Amazon serverless query service called Athena is using Presto under the hood.

What is Presto?

Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes.

Presto was designed and written from the ground up for interactive analytics and approaches the speed of commercial data warehouses while scaling to the size of organizations like Facebook.

What can it do?

Presto allows querying data where it lives, including Hive, HBase, relational databases or even proprietary data stores. A single Presto query can combine data from multiple sources, allowing for analytics across your entire organization.

Presto is targeted at analysts who expect response times ranging from sub-second to minutes. Presto breaks the false choice between having fast analytics using an expensive commercial solution or using a slow "free" solution that requires excessive hardware.

References

3114 questions
5
votes
1 answer

Convert varbinary to varchar with encoding in presto sql and AWS athena

I'm using AWS Athena. I have a string field which holds base64 encoding of a DOMString by javascript's btoa (so, not utf-8 string but instead, 16-bit-encoded string). So, the string Fútbol España is stored as Rvp0Ym9sIEVzcGHxYQ== (and not…
Yossi Vainshtein
  • 3,845
  • 4
  • 23
  • 39
5
votes
1 answer

Presto SQL - How can i get all possible combination of an array?

I want all the possible combination of a number in a given set of array. I tried using some of the predefined functions of presto like array_agg(x) Input : [1,2,3,4] Output when n=2 : [[1,2],[1,3],[1,4],[2,3],[2,4],[3,4]] when n=3 :…
Arun Kumar Dave
  • 121
  • 1
  • 4
  • 9
5
votes
1 answer

Are Amazon Athena views actually hive views, or are they a separate bolt-on?

Amazon Athena is based on Presto. Amazon Athena supports views. Presto does not support Hive views because it doesn't want to deal with Hive Query Language. Since a view is actually a Hive query, it would have to understand hive's entire language…
John Humphreys
  • 37,047
  • 37
  • 155
  • 255
5
votes
1 answer

How to do MD5 hashing of as string in athena?

MD5 hashing function in athena is not working for string. However, athena's document shows that it does : https://docs.aws.amazon.com/redshift/latest/dg/r_MD5.html Not sure what I am missing here. If I transform varchar to varbinary then the hash…
Rajeev A Nair
  • 191
  • 2
  • 4
  • 11
5
votes
2 answers

Does Presto SQL support recursive query using CTE just like SQL Server? e.g. employee hierarchy level

I want to write a recursive query using CTE in Presto to find Employee Hierarchy. Do Presto support recursive query? When I write simple recursion as with cte as(select 1 n union all select cte.n+1 from cte where n<50) select…
vaibhav
  • 87
  • 2
  • 6
5
votes
2 answers

How to query and iterate over array of structures in Athena (Presto)?

I have a S3 bucket with 500,000+ json records, eg. { "userId": "00000000001", "profile": { "created": 1539469486, "userId": "00000000001", "primaryApplicant": { "totalSavings": 65000, "incomes": [ { "amount":…
tea
  • 568
  • 2
  • 10
  • 18
5
votes
4 answers

Format int as date in presto SQL

I have an integer date column "date_created" storing values like... 20180527, 20191205, 20200208 And am wondering what the best way to parse as a date is so I could do something like this in a query... select * from table where…
d3wannabe
  • 1,207
  • 2
  • 19
  • 39
5
votes
3 answers

Spark incremental loading overwrite old record

I have a requirement to do the incremental loading to a table by using Spark (PySpark) Here's the example: Day 1 id | value ----------- 1 | abc 2 | def Day 2 id | value ----------- 2 | cde 3 | xyz Expected result id | value ----------- 1 |…
Samuel Chan
  • 231
  • 1
  • 4
  • 11
5
votes
2 answers

Python compiled script giving error of "Can't load plugin: sqlalchemy.dialects:presto"

I compiled .py file with pyinstaller as follows: pyinstaller --hidden-import presto --hidden-import scipy._lib.messagestream --onefile main.py When I ran the compiled file, I got the error: sqlalchemy.exc.NoSuchModuleError: Can't load plugin:…
nullne
  • 511
  • 6
  • 12
5
votes
2 answers

Querying nested JSON structures in AWS Athena

I got the following format of JSON document with nested structures { "id": "p-1234-2132321-213213213-12312", "name": "athena to the rescue", "groups": [ { "strategy_group": "anyOf", "conditions": [ …
erPe
  • 558
  • 2
  • 11
  • 22
5
votes
2 answers

Cumulative sum by id and by month in Presto

In Amazon Athena I have a table that looks like this: id amount date 1 100 2018-04-05 1 50 2018-06-18 2 10 2018-04-23 2 100 2018-04-28 2 50 2018-07-07 2 10 2018-08-08 And I would like a result such as id …
Artem
  • 751
  • 2
  • 10
  • 30
5
votes
1 answer

Presto SQL : Changing time zones using time zone string coming as a result of a query is not working

I am connecting to AWS Athena through Mode Analytics Platform and querying a table using its Query Engine ( which is based on Presto 0.172 ). This table public.zones has time zone information stored in a column called time_zone on some regions I am…
ConfusedMan
  • 562
  • 1
  • 4
  • 12
5
votes
1 answer

Clearly explanation of all kinds of memory in Presto

I'm SO confusing about memory-settings in Presto. Please check this out below: query.max-memory query.max-memory-per-node (base config) query.max-total-memory (release in 0.205) resources.reserved-system-memory (admin properties) Memory Pools…
Archon
  • 1,385
  • 1
  • 15
  • 30
5
votes
1 answer

Simple Batch Script for Presto query

I am running a bash script to extract data from a table via presto... ./presto --server myprestoserver:8889 --catalog mycatalog --schema myschema --execute "select * from TABLEResultsAuditLog;" > /mydirectory/audit.dat This command will…
Jason Coleman
  • 51
  • 1
  • 2
5
votes
3 answers

Getting all Buildings in range of 5 miles from specified coordinates

I have database table Building with these columns: name, lat, lng How can I get all Buildings in range of 5 miles from specified coordinates, for example these: -84.38653999999998 33.72024 My try but it does not work: SELECT ST_CONTAINS( SELECT…
paul
  • 569
  • 9
  • 13