0

I am trying to use Delta Lake Python Library in my Glue job. However, my Glue job is not able to recognize it and I get the error "NameError: name 'DeltaTable' is not defined". Per Glue-DeltaLake documentation , I added the paramter --datalake-formats = delta and also updated the required spark configuration

.config("spark.sql.extensions","io.delta.sql.DeltaSparkSessionExtension")
.config("spark.sql.catalog.spark_catalog","org.apache.spark.sql.delta.catalog.DeltaCatalog")

My code fails at below line

deltaTable = DeltaTable.forPath(self.spark,self.dest_path_sdad)

Any ideas?

Jatin
  • 75
  • 8

2 Answers2

0

These configuration properties configure Glue with the Delta Lake file format, so you can write spark.read.format("delta").load(...) or df.write.format("delta").save(...). But they doesn't provide the Python API that is available as the delta-spark package. It could be made available to Glue by using the --additional-python-modules option (doc).

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
  • Thanks for getting back! Actually now Glue supports passing the data lake format as a parameter (--datalake-formats = delta) in which case it includes the Python API and required jars without having to explicitly add them. The issue was I did not have below import statement. Adding this fixed it. from delta.tables import * – Jatin Jan 27 '23 at 04:02
0

I was missing the import statement

from delta.tables import *
Jatin
  • 75
  • 8