0

I would like to write a dataframe into Elasticsearch from within Databricks.

My Elasticsearch cluster is hosted on AWS and Databricks is spinning up EC2 instances with a certain role. That role has the permission to interact with my Elasticsearch cluster but for some reason, I seem not to be able to even PING the Elasticsearch cluster.

Failed attempt to PING my cluster

Do I need to find a way to squeeze both my Databricks workers and my Elasticsearch cluster into the same VPC? Sounds like a CloudFormation nightmare.

Adam
  • 482
  • 4
  • 15

1 Answers1

1

If you've got ES running in another VPC then you'll need either private link or peering to ensure the workers can access it. For isolation and to avoid issues with IP limits for your workers, it would be better to keep ES and DB in different VPCs.

Silvio
  • 3,947
  • 21
  • 22
  • That's good to know. Thank you. By the way, I think the second link (to "peering") is somehow pointing to the same destination as the link to "private link". – Adam Dec 21 '19 at 21:32
  • Thank you @Silvio. I got as far as launching the Elasticsearch cluster inside a VPC, setting up a security group, subnet, network load balancer and a VPC endpoint service. I now just need to create a network endpoint interface inside the Databricks VPC and point it to the VPC endpoint service of my elasticsearch. Does Databricks create a new VPC every time I launch a new cluster? Do you know how I access that VPC programmatically? – Adam Dec 24 '19 at 22:15
  • No, it's a single VPC created when you setup your account. You need an AWS administrator for your account to set the required configs. – Silvio Dec 25 '19 at 00:50