0

Delta jardelta-core_2.11-0.6.1.jar is added to EMR Master node "SPARK_HOME/jars" directory. However calling Delta API from EMR Notebook I am getting following error:

# Though Notebook comes with default SPARK instant so following line I didn't execute 
# spark = SparkSession.builder.appName("MyApp") \
#    .config("spark.jars.packages", "io.delta:delta-core_2.11:0.6.1") \
#    .getOrCreate()

from delta.tables import * # ModuleNotFoundError: No module named 'delta'

CLI command pyspark --packages "io.delta:delta-core_2.11:0.6.1" is working fine in Master node. I am able to access Delta APIs in CLI mode.

Is there any way I can use Delta APIs directly in Notebook. Please suggest.

Sarada Rout
  • 57
  • 1
  • 7
  • One of the way ```python sc = spark.sparkContext sc.addPyFile(/usr/lib/spark/jars/delta-core_2.11-0.6.1.jar") from delta.tables import * # working fine now ``` – Sarada Rout Dec 04 '20 at 09:48

1 Answers1

0

The tables.py file containing the DeltaTable class can be found in the delta repo on github. You can find it here - https://github.com/delta-io/delta/tree/master/python/delta

You can either clone the repo (Remember to select the correct branch) or copy the file and upload that to Jupyter. Either way it'll need adding as a dependency, so you'll need something like

import sys
sys.path.append('mnt/jupyterhome/<username>/<folder_containing_tables.py>)

Hopefully that'll get you up and running!

datamonk3y
  • 124
  • 1
  • 4