0

I would like to validate results for row count and column freshness on some data on AWS. I am using a check_config.json file to configure the checks. I use terraform to make a Glue job to run the check and throw the result to DynamoDB. The result in DynamoDB is not elaborate and I would like the result to be more specific on the exact results obtained before marking a check as fail or pass. I would like to see, for example, when was the table last modified(column freshness) and number of rows obtained after a count (expect_row_count).

Below is the current result in DynamoDB:

enter image description here

Below is the json code:

   {
        "table": "table1",
        
        "checks": [
            {
               "check": "custom_expect_column_to_be_fresh",
         "parameters": {
              "columns": [
                "column1"
             ],
            "strftime_format": "%Y-%m-%d",
            "threshold_days": 0,
             "threshold_hours": 10
          }
        },
        {
          "check": "expect_table_row_count_to_be_between",
          "result_format" : "COMPLETE",
          "include_config": "True",
          "parameters": {
            "min_value": 1,
            "max_value": 100000
            
          },
              "alarm" : {
                "threshold": 100,
                "period": 3600
              }
            }
        ]
      } 

I was expecting a more elaborate result on how many rows were obtained before the row_count is marked as a failure and I also want to see the last table modification timestamp before column freshness marks as a failure.

Marko E
  • 13,362
  • 2
  • 19
  • 28

0 Answers0