I would like to validate results for row count and column freshness on some data on AWS. I am using a check_config.json file to configure the checks. I use terraform to make a Glue job to run the check and throw the result to DynamoDB. The result in DynamoDB is not elaborate and I would like the result to be more specific on the exact results obtained before marking a check as fail or pass. I would like to see, for example, when was the table last modified(column freshness) and number of rows obtained after a count (expect_row_count).
Below is the current result in DynamoDB:
Below is the json code:
{
"table": "table1",
"checks": [
{
"check": "custom_expect_column_to_be_fresh",
"parameters": {
"columns": [
"column1"
],
"strftime_format": "%Y-%m-%d",
"threshold_days": 0,
"threshold_hours": 10
}
},
{
"check": "expect_table_row_count_to_be_between",
"result_format" : "COMPLETE",
"include_config": "True",
"parameters": {
"min_value": 1,
"max_value": 100000
},
"alarm" : {
"threshold": 100,
"period": 3600
}
}
]
}
I was expecting a more elaborate result on how many rows were obtained before the row_count is marked as a failure and I also want to see the last table modification timestamp before column freshness marks as a failure.