1

While creating infrastructure for big data, I wanted to use Azure HDInsight with Presto installation. Azure HDInsight comes with different flavors like hadoop, spark etc. In documentations it is recommended to use hadoop cluster but I want to use the spark one.

Is it possible to use spark cluster with Starburst's Presto distribution?

Piotr Findeisen
  • 19,480
  • 2
  • 52
  • 82
Bhanuday Birla
  • 969
  • 1
  • 10
  • 23

1 Answers1

2

It looks like you want to use both Presto and Spark at the same time. If you run them on a single cluster, you would need to configure them appropriately to make sure the JVMs for different processes can co-exist. This is possible, but hard to do in practice (you need to know how JVM allocates memory beyond -Xmx setting), so it's definitely not recommended.

While I can imagine that in some on-premises installations where provisioning new hardware is hard you could want to colocate services on one cluster. In the cloud, it's much more convenient to provision two separate clusters, each appropriately sized for your particular needs and workload. For example, you could have one cluster with Presto for interactive analytics, dashboarding and ad-hoc queries. And another one with Spark for your machine learning or ETL workloads.

Please refer to the Starburst Presto on Azure documentation for detailed configuration instructions.

Piotr Findeisen
  • 19,480
  • 2
  • 52
  • 82