How much data do I need to have to make use of Presto? The web site states that it can query data sizes from gigabytes to petabytes. I understand how it is used to query very large datasets, but is anyone using it for hundreds of gigabytes?
2 Answers
Currently, Presto is most useful if you already have an existing Hive installation. If you are using Hive, you should definitely try Presto. If all your data fits in a relational database like PostgreSQL or MySQL on a single machine, and you are happy with the performance, then keep using that.
However, Presto should be much faster than either of those databases on a single machine for analytic queries because it executes a query in parallel. Neither of those databases parallelize the execution of individual queries. At the moment, using Presto requires setting up HDFS and Hive (even on a single machine), so getting started will be more work than if you already have an existing Hive installation.

- 10,723
- 6
- 41
- 54
-
Hi David, Can you also throw some light on the differences between impala and presto? When to choose presto and when to chose impala? I saw in some blog that facebook did some study on impala before starting presto. Thanks – Sourabh Nov 18 '13 at 08:56
Or, you can take a look at Impala - which has been available as production-ready software for six months. Like Presto, Impala is a distributed SQL query engine for data in HDFS that circumvents MapReduce. Unlike Presto, there is a commercial vendor providing support (Cloudera).
That said, David's comments about data size still apply. Use the right tool for the job.

- 20,717
- 43
- 112
- 130

- 924
- 5
- 12
-
4Presto has been in production at Facebook since January and has over 1,000 daily active users who run over 30,000 queries daily. It is definitely battle-tested software. (I work on Presto at Facebook) – David Phillips Nov 06 '13 at 23:12
-
I don't see the point to talk about concurent product X when the question concerns product Y. – Damien Carol Feb 20 '14 at 14:01
-
So....we should let's all pick up product X, and then figure out a use case for it later? I always thought it was the other way around ("I have big data I need to query", not "I want to use Presto, how much data do I need?") – Justin Kestelyn Jul 29 '14 at 22:51
-
2