I am using great expectations to test streaming data (I collect a sample into a batch and test the batch). The issue is I cannot use the docs because this will results in 100 of 1000s of html pages being generated. What I would like to do is use my api to generate the page requested from the json result when the specific test results are clicked on (via the index page). Is great expectations able to generate only 1 html which can be disposed of when it is closed?
1 Answers
If you are using a ValidationOperator / Checkpoint, then using the UpdateDataDocsAction
action supports only building the resources that were validated in that run, and is the recommended approach.
If you are interacting directly with the DataContext API, then the build_data_docs
method on DataContext supports a resource identifier option that you can use to request only a single asset is built. I think to get the behavior you're looking for (a truly ephemeral build of just that page), you'd want to pair that with a site configuration for a site in a temporary location, e.g. /tmp.
The docs on the build_data_docs method are here: https://docs.greatexpectations.io/en/latest/autoapi/great_expectations/data_context/data_context/index.html#great_expectations.data_context.data_context.BaseDataContext.build_data_docs
Note that the resource_identifiers parameter requires, e.g. a ValidationResultIdentifier object, such as:
context.build_data_docs("local_site", resource_identifiers=[ValidationResultIdentifier(
run_id="20201203T182816.362147Z",
expectation_suite_identifier=ExpectationSuiteIdentifier("foo"),
batch_identifier="b739515cf1c461d67b4e56d27f3bfd02",
)])

- 576
- 2
- 5
-
Awesome, I will try this. Thank you :) – Andy MGF Dec 04 '20 at 09:23