0

I'm running delta lake on my local host. I setup a spark-master, 2 spark-workers, and one spark-driver docker container. Inside the spark-driver container is where I run spark-submit that points to spark://spark-master:7077 . The spark-driver connects to spark-master, however it can not find _delta_log files. I receive the following error even though the file exists.

org.apache.spark.SparkFileNotFoundException: File file:/opt/ufo-lakehouse/lakehouse/ufo/bronze/_delta_log/00000000000000000000.json does not exist

Here is a snippet of my docker-compose.yaml . As you can see I have the volumes mapped correctly.

version: '3.8'

networks:
  spark-network:

x-defaults:
  &spark-common
  image: bitnami/spark:3.4.0
  user: root
  networks:
    - spark-network
  volumes:
    - ./lakehouse:/opt/bitnami/spark/lakehouse
    - ./logs/spark:/opt/bitnami/spark/logs
    - ./src/spark/spark-defaults.conf:/opt/bitnami/spark/conf/spark-defaults.conf

services:
  spark-master:
    <<: *spark-common
    container_name: spark-master
    hostname: spark-master
    environment:
      - SPARK_MODE=master
      - SPARK_MASTER_HOST=spark-master
      - SPARK_MASTER_PORT=7077
      - SPARK_MASTER_WEBUI_PORT=8080
    ports:
      - 8080:8080
      - 7077:7077

I tried everything I could with permission and nothing seems to work. I also set spark.hadoop.fs.permissions.umask-mode to 000 thinking spark was writing files with the wrong permissions but this didn't help either. Are there any suggestions? Thanks in advance.

prime90
  • 889
  • 2
  • 14
  • 26
  • Logs suggest it's looking for the file in `/opt/ufo-lakehouse/lakehouse/ufo/bronze/_delta_log/00000000000000000000.json`, but your volume is mounted at `/opt/bitnami/spark/lakehouse` Can you confirm exactly where the _delta_log folder exists inside the container? – o_O Jun 21 '23 at 14:43
  • I can confirm in my 'spark-driver' container (not shown in the above docker-compose file), that the delta lake program created bronze, silver, and gold directories with _delta_log folder and the .json file that it says its missing. I can also see the folders on my local host are created (outside of the docker container). However, even though the .lakehouse directory is mapped to spark-master and spark-worker, the lakehouse directory is empty in those containers. Do I also need to map each sub directory? – prime90 Jun 22 '23 at 01:39

0 Answers0