0

As part of pytest, I'm trying to load the deltata lake extensions into the spark session like below

@pytest.fixture(scope="function")
def spark_session(request):
    # spark = SparkSession.builder.master("local[*]").appName("ReportingDimensionDimTests").getOrCreate()
    conf = pyspark.SparkConf()
    conf.set("spark.jars.packages", "io.delta:delta-core_2.12:2.3.0")
    conf.set("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension")
    conf.set(
        "spark.sql.catalog.spark_catalog",
        "org.apache.spark.sql.delta.catalog.DeltaCatalog",
    )

    spark = (
        SparkSession.builder.master("local[*]")
        .appName("Test App")
        .config(conf=conf)
        .getOrCreate()
    )
    request.addfinalizer(lambda: spark.stop())
    return spark

And as part of test, I tried to do some operation about delta lake and its failing with error that Caused by: java.lang.ClassNotFoundException: delta.DefaultSource.

Test case:

def test_ConformedDimensions_ReportingDimensionDim_initial_run_test(spark_session):
    data = spark_session.range(0, 5)
    data.write.format("delta").mode("overwrite").save("/tmp/delta-table2")

When I checked the spark context of session in test, app name says pyspark-shell. I suspect this is due to existing session and config I specified in my test is not even picked up.

How to handle this?

Santosh Hegde
  • 3,420
  • 10
  • 35
  • 51

0 Answers0