2

The following code will convert an Apache Spark DataFrame to a Great_Expectations DataFrame. For if I wanted to convert the Spark DataFrame, spkDF to a Great_Expectations DataFrame I would do the following:

ge_df = SparkDFDataset(spkDF)

Can someone let me know how convert a Great_Expectation dataframe to a Spark DataFrame.

So what would I need to do to convert the new Great_Expectations dataframe ge_df back to Spark DataFrame?

blackbishop
  • 30,945
  • 11
  • 55
  • 76
Patterson
  • 1,927
  • 1
  • 19
  • 56

1 Answers1

3

According to the official documentation, the class SparkDFDataset holds the original pyspark dataframe:

This class holds an attribute spark_df which is a spark.sql.DataFrame.

So you should be able to access it with :

ge_df.spark_df
Vincent Doba
  • 4,343
  • 3
  • 22
  • 42
blackbishop
  • 30,945
  • 11
  • 55
  • 76
  • Thanks blackbishop. That actually answered my question, but what I was hoping was that I could use the same method of converting the Validation results from Great_Expectations into a dataframe. For example if I were to write validation_result.spark_df I would get the following error message ```AttributeError: 'ExpectationSuiteValidationResult' object has no attribute 'spark_df' Traceback (most recent call last): AttributeError: 'ExpectationSuiteValidationResult' object has no attribute 'spark_df'``` – Patterson Nov 11 '21 at 13:48
  • @Patterson Oh I see. Not sure you can do this directly. My be you can use [`to_json_dict`](https://github.com/great-expectations/great_expectations/blob/d51afdd1af1ec1627cb5ee541847759255ce3bf1/great_expectations/core/expectation_validation_result.py#L167) on the `ExpectationSuiteValidationResult` to get a the result as python dict then create pyspark dataframe from it. – blackbishop Nov 11 '21 at 13:56
  • is that something you could provide some guidance on please – Patterson Nov 11 '21 at 13:58
  • Or do you think I should open up another question? – Patterson Nov 11 '21 at 13:59
  • @Patterson I can't try this right now but it should be easy to test it yourself. First, get the result as dict `result_dict = validation_result.to_json_dict()`. Then create [spark dataframe using the dict](https://stackoverflow.com/questions/43751509/how-to-create-new-dataframe-with-dict): `df = spark.createDataFrame(result_dict)` – blackbishop Nov 11 '21 at 14:07
  • thanks for your latest comments. I will try your suggestion – Patterson Nov 11 '21 at 14:38