I am implementing Data quality checks using Great Expectation library.The dataset size 80GB and the number of rows 513749893.
Following is the code which i am implementing to find out unique checks on one of the column,
import great_expectations as ge
df=spark.sql("select * from rawdata ")
#convert to Great Expectation dataset
gedf = ge.dataset.SparkDFDataset(df)
DQI=gedf.expect_column_values_to_be_unique("ID", result_format = "COMPLETE")
I am getting an error like "python Kernel unresponsive" i am not understanding this issue because of memory of my cluster or something else. My cluster configurations are below, 6 Workers 768 GB Memory 96 Cores 1 Driver 128 GB Memory, 32 Cores Does great expectation run on multiple cores?is it memory issue?