I'm running delta lake on my local host. I setup a spark-master, 2 spark-workers, and one spark-driver docker container. Inside the spark-driver container is where I run spark-submit
that points to spark://spark-master:7077
. The spark-driver connects to spark-master, however it can not find _delta_log files. I receive the following error even though the file exists.
org.apache.spark.SparkFileNotFoundException: File file:/opt/ufo-lakehouse/lakehouse/ufo/bronze/_delta_log/00000000000000000000.json does not exist
Here is a snippet of my docker-compose.yaml . As you can see I have the volumes mapped correctly.
version: '3.8'
networks:
spark-network:
x-defaults:
&spark-common
image: bitnami/spark:3.4.0
user: root
networks:
- spark-network
volumes:
- ./lakehouse:/opt/bitnami/spark/lakehouse
- ./logs/spark:/opt/bitnami/spark/logs
- ./src/spark/spark-defaults.conf:/opt/bitnami/spark/conf/spark-defaults.conf
services:
spark-master:
<<: *spark-common
container_name: spark-master
hostname: spark-master
environment:
- SPARK_MODE=master
- SPARK_MASTER_HOST=spark-master
- SPARK_MASTER_PORT=7077
- SPARK_MASTER_WEBUI_PORT=8080
ports:
- 8080:8080
- 7077:7077
I tried everything I could with permission and nothing seems to work. I also set spark.hadoop.fs.permissions.umask-mode
to 000
thinking spark was writing files with the wrong permissions but this didn't help either. Are there any suggestions? Thanks in advance.