0

I have a requirement to send AWS Backup events - specifically failed backups and backups that had Windows VSS fail on backup to a centralized Opsgenie alerting system. AWS directed us to use EventBridge to parse the JSON object produced by AWS Backups to determine whether the VSS portion failed or not.

SNS is not a viable option because we cannot 'OR' the two rules together in one filter policy, and we only have one endpoint so two subscriptions to the same topic will overwrite one. That said, I did successfully send messages to OpsGenie via SNS. So far with Eventbridge, I have not had any luck.

I have started to implement most of this in terraform. I realize TF has some limitations to using EventsBridge (my two rules cannot be tied to the custom bus I create; I have to do this step manually. Also, I need to create the Opsgenie API integration manually as Opsgenie does not seem to have support for the 'EventBridge' type yet. Only the older version of Cloudwatch events that ties into SNS seems to be there. Below is my terraform for reference:

# This module creates an opsgenie team and will tie in existing emails to the team to use with the integration.

module "opsgenie_team" {
  source  = "app.terraform.io/etc.../opsgenie"
  version = "1.1.0"

  team_name         = "test team"
  team_description  = "test environment."
  team_admin_emails = var.opsgenie_team_admins
  team_user_emails  = var.opsgenie_team_users

  suppress_cloudwatch_events_notifications = var.opsgenie_suppress_cloudwatch_events_notifications
  suppress_cloudwatch_notifications        = var.opsgenie_suppress_cloudwatch_notifications
  suppress_generic_sns_notifications       = var.opsgenie_suppress_generic_sns_notifications
}

# Step commented out since 'Webhook' doesn't work. 
#
# resource "opsgenie_api_integration" "opsgenie" {
#   name = "api-based-int-2"
#   type = "Webhook"
#
#   responders {
#     type = "user"
#     id   = data.opsgenie_user.test.id
#   }
#
#   enabled                = true
#   allow_write_access     = true
#   suppress_notifications = false
#   webhook_url            = module.opsgenie_team.cloudwatch_events_integration_sns_endpoint
# }

resource "aws_cloudwatch_event_api_destination" "opsgenie" {
  name                             = "Test"
  description                      = "Connection to OpsGenie"
  invocation_endpoint              = module.opsgenie_team.cloudwatch_events_integration_sns_endpoint
  http_method                      = "POST"
  invocation_rate_limit_per_second = 20
  connection_arn                   = aws_cloudwatch_event_connection.opsgenie.arn
}

resource "aws_cloudwatch_event_connection" "opsgenie" {
  name               = "opsgenie-event-connection"
  description        = "Connection to OpsGenie"
  authorization_type = "API_KEY"

  # Verified key seems to be valid on integration API 
  # https://api.opsgenie.com/v2/integrations
  
  auth_parameters {
    api_key {
      key   = module.opsgenie_team.cloudwatch_events_integration_id
      value = module.opsgenie_team.cloudwatch_events_integration_api_key
    }
  }
}

# Opsgenie ID created with the manual integration step.

data "aws_cloudwatch_event_source" "opsgenie" {
  name_prefix = "aws.partner/opsgenie.com/MY-OPSGENIE-ID"
}

resource "aws_cloudwatch_event_bus" "opsgenie" {
  name              = data.aws_cloudwatch_event_source.opsgenie.name
  event_source_name = data.aws_cloudwatch_event_source.opsgenie.name
}

# Two rules I need to filter on, commented out as they cannot be tied to a custom bus with 
# terraform.

# resource "aws_cloudwatch_event_rule" "opsgenie_backup_failures" {
#   name        = "capture-generic-backup-failures"
#   description = "Capture all other backup failures"
#
#   event_pattern = <<EOF
#   {
#     "State": [
#       {
#         "anything-but": "COMPLETED"
#       }
#     ]
#   }
# EOF
# }
#
# resource "aws_cloudwatch_event_rule" "opsgenie_vss_failures" {
#   name        = "capture-vss-failures"
#   description = "Capture VSS Backup failures"
#
#   event_pattern = <<EOF
#   {
#     "detail-type" : [
#        "Windows VSS Backup attempt failed because either Instance or SSM Agent has invalid state or insufficient privileges."
#      ]
#   }
# EOF
# }

The event bus and API destination seem to be created correctly, and I can find the API key used to communicate with Opsgenie and use it in postman to hit an Opsgenie endpoint. I manually create the rules and tie them in to the custom bus. I even kept them open, hoping to capture any AWS backup events - nothing yet.

I feel like I'm close, but missing a critical detail (or two). Any help is greatly appreciated.

BPS
  • 607
  • 8
  • 29

1 Answers1

0

Posing the same question to Atlassian, they sent me this email:

We do have an open feature request for a direct, inbound integration with EventBridge - I've added your info and a +1 to the request, so hopefully we'll be able to add that in the future. For reference, the request ID is OGS-4502.

In the meantime, though, you're correct - you'd need to either use our CloudWatch Events integration or a direct SNS integration, instead, which may restrict some of the functionality you would get using EventBridge directly. With that said - Opsgenie does offer robust filtering functionality via the advanced integration settings and alert policies that may be able to achieve the same sort of filtering you would want to set up on the EventBridge side of things:

https://support.atlassian.com/opsgenie/docs/use-advanced-integration-settings/ https://support.atlassian.com/opsgenie/docs/create-and-manage-global-alert-policies/

So, for now, the answer is to consume all events at the OpsGenie endpoint and filter them with 'opsgenie_integration_action' resources.

BPS
  • 607
  • 8
  • 29