I am trying to create a data quality validation for set of files in s3. For that I have chose AWS data brew and have created a dataset, data quality rules and a data profile job via SAM template. Here, Once a dataset is created I have to refer the Arn of the dataset while creating the ruleset and also the Arn of ruleset for the profile job. On checking documentation I can see that ARN is not part of outputs for the dataset and data quality rule set. So is it possible to dynamically refer these values. Or should I create rulesets separately.
SampleDataSet:
Type: AWS::DataBrew::Dataset
Properties:
Name: SampleDataSet
Input:
S3InputDefinition:
Bucket: *****
Key: *****
SampleRuleSet:
Type: AWS::DataBrew::Ruleset
Properties:
Name: SampleRuleSet
Rules:
- Name: rule1
Disabled : true
CheckExpression: "AGG(DUPLICATE_ROWS_COUNT) <= :val1"
SubstitutionMap:
- Value: "0"
ValueReference: ":val1"
TargetArn: !GetAtt SampleDataSet.Arn
DependsOn: SampleDataSet
SampleProfileJob:
Type: AWS::DataBrew::Job
Properties:
Name: SampleProfileJob
Type: PROFILE
RoleArn: !GetAtt GenericDataBrewDataQualityRole.Arn
DatasetName: SampleDataSet
Timeout: 5
ValidationConfigurations:
- RulesetArn: !GetAtt SampleRuleSet.Arn
OutputLocation:
Bucket: *****
DependsOn: SampleRuleSet