python Great Expectations memory error for unique check

Asked Oct 06 '22 at 06:34

Active Oct 06 '22 at 06:34

Viewed 171 times

I am implementing Data quality checks using Great Expectation library.The dataset size 80GB and the number of rows 513749893.

Following is the code which i am implementing to find out unique checks on one of the column,

import great_expectations as ge
df=spark.sql("select * from rawdata  ")
#convert to Great Expectation dataset
gedf = ge.dataset.SparkDFDataset(df)
DQI=gedf.expect_column_values_to_be_unique("ID", result_format = "COMPLETE")

I am getting an error like "python Kernel unresponsive" i am not understanding this issue because of memory of my cluster or something else. My cluster configurations are below, 6 Workers 768 GB Memory 96 Cores 1 Driver 128 GB Memory, 32 Cores Does great expectation run on multiple cores?is it memory issue?

asked Oct 06 '22 at 06:34

code_bug

python Great Expectations memory error for unique check

0 Answers0