How much data do I need to have to make use of Presto?

Question

How much data do I need to have to make use of Presto? The web site states that it can query data sizes from gigabytes to petabytes. I understand how it is used to query very large datasets, but is anyone using it for hundreds of gigabytes?

score 6 · Answer 1 · answered Nov 06 '13 at 21:12

Currently, Presto is most useful if you already have an existing Hive installation. If you are using Hive, you should definitely try Presto. If all your data fits in a relational database like PostgreSQL or MySQL on a single machine, and you are happy with the performance, then keep using that.

However, Presto should be much faster than either of those databases on a single machine for analytic queries because it executes a query in parallel. Neither of those databases parallelize the execution of individual queries. At the moment, using Presto requires setting up HDFS and Hive (even on a single machine), so getting started will be more work than if you already have an existing Hive installation.

Hi David, Can you also throw some light on the differences between impala and presto? When to choose presto and when to chose impala? I saw in some blog that facebook did some study on impala before starting presto. Thanks — Sourabh, Nov 18 '13 at 08:56

score 0 · Answer 2 · edited Aug 11 '16 at 03:59

0

Or, you can take a look at Impala - which has been available as production-ready software for six months. Like Presto, Impala is a distributed SQL query engine for data in HDFS that circumvents MapReduce. Unlike Presto, there is a commercial vendor providing support (Cloudera).

That said, David's comments about data size still apply. Use the right tool for the job.

edited Aug 11 '16 at 03:59

Billal Begueradj

20,717
43
112
130

answered Nov 06 '13 at 22:25

Justin Kestelyn

924
5
12

4

Presto has been in production at Facebook since January and has over 1,000 daily active users who run over 30,000 queries daily. It is definitely battle-tested software. (I work on Presto at Facebook) – David Phillips Nov 06 '13 at 23:12
I don't see the point to talk about concurent product X when the question concerns product Y. – Damien Carol Feb 20 '14 at 14:01
So....we should let's all pick up product X, and then figure out a use case for it later? I always thought it was the other way around ("I have big data I need to query", not "I want to use Presto, how much data do I need?") – Justin Kestelyn Jul 29 '14 at 22:51
2

this should be a comment not an answer – kirill_igum May 01 '16 at 00:46

How much data do I need to have to make use of Presto?

2 Answers2