2

I am using Great_Expectations in databricks.

I am using shared cluser and runtime version is 13.1 Beta (includes Apache Spark 3.4.0, Scala 2.12)

  • py4j version 0.10.9.7
  • pyspark version 3.4.0

here is my code:

%pip install great_expectations
dbutils.library.restartPython()

import great_expectations as gx
from great_expectations.checkpoint import SimpleCheckpoint

context_root_dir = "abfss://<container>@<acc>.dfs.core.windows.net/tmp/great_expectations/"
context = gx.get_context(context_root_dir=context_root_dir)
print(context)

from pyspark.sql import SparkSession
import pandas as pd

session_name = 'mk_spark_session'
spark = SparkSession.builder.appName(session_name).getOrCreate()

query = "SELECT * FROM my_test_table limit 10"
spark_df = spark.sql(query)
# print(spark_df)
#(returns -- DataFrame[<data>])

dataframe_datasource = context.sources.add_or_update_spark(
    name="my_spark_in_memory_datasource",
)
print(dataframe_datasource)
#(returns --> name: my_spark_in_memory_datasource
type: spark)


dataframe_asset = dataframe_datasource.add_dataframe_asset(
    name="MK_DF_asset",
    dataframe=spark_df,
)
print(dataframe_asset)

#(returns --> batch_metadata: {}
name: MK_DF_asset
type: dataframe)

#NOT sure why batch_metadata is blank?

dataframe_datasource = context.sources.add_or_update_spark(
    name="my_spark_in_memory_datasource",
)
print(dataframe_datasource)
#(returns --> name: my_spark_in_memory_datasource
type: spark)

dataframe_asset = dataframe_datasource.add_dataframe_asset(
    name="MK_DF_asset",
    dataframe=spark_df,
)
print(dataframe_asset)

#(returns --> batch_metadata: {}
name: MK_DF_asset
type: dataframe)


batch_request = dataframe_asset.build_batch_request()
print(batch_request)

#(returns--> datasource_name='my_spark_in_memory_datasource' data_asset_name='MK_DF_asset' options={})


# create expectation
expectation_suite_name = "MK_expectation_suite"

context.add_or_update_expectation_suite(expectation_suite_name=expectation_suite_name)

########################################################################
# and I get error on the following command 
validator = context.get_validator(
    batch_request=batch_request,
    expectation_suite_name=expectation_suite_name,
)
########################################################################

print(validator.head())`

type here

and I get following error:

# **py4j.security.Py4JSecurityException: Constructor public org.apache.spark.SparkConf(boolean) is not whitelisted.
# 
# Py4JError: An error occurred while calling None.org.apache.spark.SparkConf. Trace:
# py4j.security.Py4JSecurityException: Constructor public org.apache.spark.SparkConf(boolean) is not whitelisted.
#   at py4j.security.WhitelistingPy4JSecurityManager.checkConstructor(WhitelistingPy4JSecurityManager.java:451)
#   at py4j.Gateway.invoke(Gateway.java:256)
#   at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
#   at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
#   at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
#   at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
#   at java.lang.Thread.run(Thread.java:750)**

I couldn't figure out why I am getting this error! possible compatibility issue but when I checked I am using latest versions.

  • 1
    Can you edit your post to include the bigger stacktrace - it's not clear from which line the error is coming – Alex Ott May 28 '23 at 10:07
  • ######################################################################## # and I get error on the following command validator = context.get_validator( batch_request=batch_request, expectation_suite_name=expectation_suite_name, ) ######################################################################## – Milind Keer May 31 '23 at 12:03

0 Answers0