I'm trying to create a graph and corresponding alert for lag for my Kafka consumer using AWS-CDK, but I can't seem to get it to generate a valid configuration.
The metric I'm reporting is a Count
with two dimensions: topic
and partition
. Now I would like to create a graph with one line for each partition, as well as one line for the sum of those values. Essentially, I'm trying to replicate this graph in CDK:
// To create one metric per partition, I'm using a MathExpression with a SEARCH
// since I won't necessarily know how many partitions there are.
const lagPerPartition = new MathExpression({
expression: "SEARCH(' {kafkajs-canary-app/Consumer, topic, partition} MetricName=OffsetLag topic=test-topic', 'Average', 300)",
// This feels suspicious to me. `usingMetrics` is required, but I don't have any use
// for a metric in my expression, and adding a metric I don't use doesn't seem to change anything
usingMetrics: {}
});
// Now to sum those values
const totalLag = new MathExpression({
expression: 'SUM(lagPerPartition)',
usingMetrics: {
lagPerPartition
},
label: "Total lag"
});
// Now to create alarms
const lagPerPartitionAlarm = lagPerPartition.createAlarm(this, 'ConsumerOffsetLagPerPartition', {
alarmName: 'Offset Lag per Partition',
alarmDescription: 'Consumer has high lag on one or more partitions',
threshold: 100,
evaluationPeriods: 1,
treatMissingData: TreatMissingData.NOT_BREACHING,
});
const totalLagAlarm = totalLag.createAlarm(this, 'ConsumerOffsetLagTotal', {
alarmName: 'Total Offset Lag',
alarmDescription: 'Consumer has high lag across all partitions',
threshold: 200,
evaluationPeriods: 1,
treatMissingData: TreatMissingData.NOT_BREACHING,
});
// Finally we add it to our dashboard
const dashboard = new Dashboard(this, 'Dashboard', { dashboardName: 'Health' })
dashboard.addWidgets(new GraphWidget({
title:'Consumer lag',
left: [lagPerPartition, totalLag],
stacked: false,
width: 8,
leftAnnotations: [
lagPerPartitionAlarm.toAnnotation(),
totalLagAlarm.toAnnotation()
]
}));
Trying to execute this stack just results in an error from CloudFormation that points to there being an issue with one of the expressions, but not what the problem is and why I can create what I would think would be the same thing via the UI:
Alarm contains invalid expressions. (Service: AmazonCloudWatch; Status Code: 400; Error Code: ValidationError; Request ID: 487e9db7-0035-4fca-90ac-8d0227c589ea; Proxy: null)