0

I am new to the Great Expectations package. I found this tutorial for connecting to a data source, validating the data and visualising the output as a data doc which is saved as an html. https://docs.greatexpectations.io/docs/tutorials/getting_started/tutorial_setup

However I am not able to run the CLI commands used in the tutorial. Is there a way to generate the data docs seen in the tutorial above from a series of expectation results ran on an in-memory pandas dataframe.

This article goes through how to perform the expectation results on a read-in pandas dataframe, and for each expectation outputs a result dictionary, however it does not explain how to take the results and convert them into a data docs. https://towardsdatascience.com/a-great-python-library-great-expectations-6ac6d6fe822e

Minimal Reproducible Example
Python==3.8.15
Packages: 
great-expectations==0.15.41
pandas==1.5.2

import pandas as pd
import great_expectations as gx

# simple dataframe
df = pd.DataFrame({'A': [1, 2, 3, 4, 5],
                   'B': ['a','b','c','d','e']})

# Turn into GX dataframe
df = gx.from_pandas(df)

df.head()
 [enter image description here](https://i.stack.imgur.com/5IC9R.png)

gx_result = df.expect_column_to_exist("A")

print(gx_result)
 [enter image description here](https://i.stack.imgur.com/yF3tS.png)

# Code to convert expectation result into data doc

I have also found this piece of documentation that refers to creating a data doc, but am unsure how to connect it with the code above. https://docs.greatexpectations.io/docs/terms/data_docs/ 

Thanks in advance

1 Answers1

0

Hi James following are steps to achieve what you are looking for using programmatic way.

  1. Connect to runtime pandas using python. Check for no cli + no filesystem tab. https://docs.greatexpectations.io/docs/guides/connecting_to_your_data/in_memory/pandas/

  2. Create Checkpoint. Use Python section. Refer section 5. validate data. Change from sparkdf to pandasdf wherever applicable. https://docs.greatexpectations.io/docs/deployment_patterns/how_to_use_great_expectations_in_emr_serverless.

You need combine code w.r.t your context to achieve what you want.

Hope it helps.

Sarang Shinde
  • 717
  • 3
  • 7
  • 24
  • Thank you very much for your answer! In addition to the documentation you linked (which were very useful) I found the additional documents were helpful to add expectations to the expectations suite without using CLI and without a sample batch. 1. https://great-expectations.readthedocs.io/en/0.13.4/guides/how_to_guides/creating_and_editing_expectations/how_to_create_a_new_expectation_suite_without_the_cli.html 2. https://legacy.docs.greatexpectations.io/en/latest/guides/how_to_guides/creating_and_editing_expectations/how_to_create_a_new_expectation_suite_without_a_sample_batch.html – James Challis Jan 11 '23 at 13:39