0

I have a Spark cluster running in a Docker container (using an image I made myself). It's all working fine.

I now want to use Apache Livy and as per the documentation it says I need to get in a place a couple of environment variables: https://livy.incubator.apache.org/get-started/

export SPARK_HOME=/usr/lib/spark
export HADOOP_CONF_DIR=/etc/hadoop/conf

My question is as Spark is running in Docker as opposed to local installation, what options do I have to reference those 2 directories in the exports.

This is actually a common problem I face so any help on best practices would really help.

Thanks.

Nazim Kerimbekov
  • 4,712
  • 8
  • 34
  • 58
userMod2
  • 8,312
  • 13
  • 63
  • 115

2 Answers2

0

The easiest option would be to install livy along with spark inside the same docker container running spark and expose required ports outside.

A better solution would be to create a separate container for livy (with the same config files used in /usr/lib/spark and /etc/hadoop/conf) and connect using docker-network. And only expose livy ports outside.

shanmuga
  • 4,329
  • 2
  • 21
  • 35
0

You can create a volume. A volume is a shared folder between your machine and your docker.

docker run -v /home/userName/Docker/spark:/usr/lib/spark \
           -v /home/userName/Docker/hadoop:/etc/hadoop/ \
           ...

Then you can create your environment variables to your path. export SPARK_HOME=/home/userName/Docker/spark and export HADOOP_CONF_DIR=/home/userName/Docker/hadoop/conf should work with this example.