As part of pytest
, I'm trying to load the deltata lake extensions into the spark session like below
@pytest.fixture(scope="function")
def spark_session(request):
# spark = SparkSession.builder.master("local[*]").appName("ReportingDimensionDimTests").getOrCreate()
conf = pyspark.SparkConf()
conf.set("spark.jars.packages", "io.delta:delta-core_2.12:2.3.0")
conf.set("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension")
conf.set(
"spark.sql.catalog.spark_catalog",
"org.apache.spark.sql.delta.catalog.DeltaCatalog",
)
spark = (
SparkSession.builder.master("local[*]")
.appName("Test App")
.config(conf=conf)
.getOrCreate()
)
request.addfinalizer(lambda: spark.stop())
return spark
And as part of test, I tried to do some operation about delta lake and its failing with error that Caused by: java.lang.ClassNotFoundException: delta.DefaultSource
.
Test case:
def test_ConformedDimensions_ReportingDimensionDim_initial_run_test(spark_session):
data = spark_session.range(0, 5)
data.write.format("delta").mode("overwrite").save("/tmp/delta-table2")
When I checked the spark context of session in test, app name says pyspark-shell
. I suspect this is due to existing session and config I specified in my test is not even picked up.
How to handle this?