13

I am trying to stream the AWS cloudwatch logs to ES via Kinesis Firehose. Below terraform code is giving an error. Any suggestions.. The error is:

  • aws_cloudwatch_log_subscription_filter.test_kinesis_logfilter: 1 error(s) occurred:
  • aws_cloudwatch_log_subscription_filter.test_kinesis_logfilter: InvalidParameterException: Could not deliver test message to specified Firehose stream. Check if the given Firehose stream is in ACTIVE state.

resource "aws_s3_bucket" "bucket" {
  bucket = "cw-kinesis-es-bucket"
  acl    = "private"
}

resource "aws_iam_role" "firehose_role" {
  name = "firehose_test_role"

  assume_role_policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "sts:AssumeRole",
      "Principal": {
        "Service": "firehose.amazonaws.com"
      },
      "Effect": "Allow",
      "Sid": ""
    }
  ]
}
EOF
}

resource "aws_elasticsearch_domain" "es" {
  domain_name           = "firehose-es-test"
  elasticsearch_version = "1.5"
  cluster_config {
    instance_type = "t2.micro.elasticsearch"
  }
  ebs_options {
    ebs_enabled = true
    volume_size = 10
  }

  advanced_options {
    "rest.action.multi.allow_explicit_index" = "true"
  }

  access_policies = <<CONFIG
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": "es:*",
            "Principal": "*",
            "Effect": "Allow",
            "Condition": {
                "IpAddress": {"aws:SourceIp": ["xxxxx"]}
            }
        }
    ]
}
CONFIG

  snapshot_options {
    automated_snapshot_start_hour = 23
  }

  tags {
    Domain = "TestDomain"
  }
}

resource "aws_kinesis_firehose_delivery_stream" "test_stream" {
  name        = "terraform-kinesis-firehose-test-stream"
  destination = "elasticsearch"

  s3_configuration {
    role_arn           = "${aws_iam_role.firehose_role.arn}"
    bucket_arn         = "${aws_s3_bucket.bucket.arn}"
    buffer_size        = 10
    buffer_interval    = 400
    compression_format = "GZIP"
  }

  elasticsearch_configuration {
    domain_arn = "${aws_elasticsearch_domain.es.arn}"
    role_arn   = "${aws_iam_role.firehose_role.arn}"
    index_name = "test"
    type_name  = "test"
  }
}

resource "aws_iam_role" "iam_for_lambda" {
  name = "iam_for_lambda"
  assume_role_policy = <<EOF
  {
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "sts:AssumeRole",
      "Principal": {
        "Service": "lambda.amazonaws.com"
      },
      "Effect": "Allow",
      "Sid": ""
    }
  ]
}
EOF
}

resource "aws_cloudwatch_log_subscription_filter" "test_kinesis_logfilter" {
  name            = "test_kinesis_logfilter"
  role_arn        = "${aws_iam_role.iam_for_lambda.arn}"
  log_group_name  = "loggorup.log"
  filter_pattern  = ""
  destination_arn = "${aws_kinesis_firehose_delivery_stream.test_stream.arn}"
}

Martin Atkins
  • 62,420
  • 8
  • 120
  • 138
Bond
  • 855
  • 3
  • 13
  • 24

1 Answers1

39

In this configuration you are directing Cloudwatch Logs to send log records to Kinesis Firehose, which is in turn configured to write the data it receives to both S3 and ElasticSearch. Thus the AWS services you are using are talking to each other as follows:

Cloudwatch Logs talks to Kinesis Firehose, which in turn talks to both S3 and ElasticSearch

In order for one AWS service to talk to another the first service must assume a role that grants it access to do so. In IAM terminology, "assuming a role" means to temporarily act with the privileges granted to that role. An AWS IAM role has two key parts:

  • The assume role policy, that controls which services and/or users may assume the role.
  • The policies controlling what the role grants access to. This decides what a service or user can do once it has assumed the role.

Two separate roles are needed here. One role will grant Cloudwatch Logs access to talk to Kinesis Firehose, while the second will grant Kinesis Firehose access to talk to both S3 and ElasticSearch.

For the rest of this answer, I will assume that Terraform is running as a user with full administrative access to an AWS account. If this is not true, it will first be necessary to ensure that Terraform is running as an IAM principal that has access to create and pass roles.


Access for Cloudwatch Logs to Kinesis Firehose

In the example given in the question, the aws_cloudwatch_log_subscription_filter has a role_arn whose assume_role_policy is for AWS Lambda, so Cloudwatch Logs does not have access to assume this role.

To fix this, the assume role policy can be changed to use the service name for Cloudwatch Logs:

resource "aws_iam_role" "cloudwatch_logs" {
  name = "cloudwatch_logs_to_firehose"
  assume_role_policy = jsonencode({
    "Version": "2012-10-17",
    "Statement": [
      {
        "Action": "sts:AssumeRole",
        "Principal": {
          "Service": "logs.us-east-1.amazonaws.com"
        },
        "Effect": "Allow",
        "Sid": "",
      },
    ],
  })
}

The above permits the Cloudwatch Logs service to assume the role. Now the role needs an access policy that permits writing to the Firehose Delivery Stream:

resource "aws_iam_role_policy" "cloudwatch_logs" {
  role = aws_iam_role.cloudwatch_logs.name

  policy = jsonencode({
    "Statement": [
      {
        "Effect": "Allow",
        "Action": ["firehose:*"],
        "Resource": [aws_kinesis_firehose_delivery_stream.test_stream.arn],
      },
    ],
  })
}

The above grants the Cloudwatch Logs service access to call into any Kinesis Firehose action as long as it targets the specific delivery stream created by this Terraform configuration. This is more access than is strictly necessary; for more information, see Actions and Condition Context Keys for Amazon Kinesis Firehose.

To complete this, the aws_cloudwatch_log_subscription_filter resource must be updated to refer to this new role:

resource "aws_cloudwatch_log_subscription_filter" "test_kinesis_logfilter" {
  name            = "test_kinesis_logfilter"
  role_arn        = aws_iam_role.cloudwatch_logs.arn
  log_group_name  = "loggorup.log"
  filter_pattern  = ""
  destination_arn = aws_kinesis_firehose_delivery_stream.test_stream.arn

  # Wait until the role has required access before creating
  depends_on = aws_iam_role_policy.cloudwatch_logs
}

Unfortunately due to the internal design of AWS IAM, it can often take several minutes for a policy change to come into effect after Terraform submits it, so sometimes a policy-related error will occur when trying to create a new resource using a policy very soon after the policy itself was created. In this case, it's often sufficient to simply wait 10 minutes and then run Terraform again, at which point it should resume where it left off and retry creating the resource.


Access for Kinesis Firehose to S3 and Amazon ElasticSearch

The example given in the question already has an IAM role with a suitable assume role policy for Kinesis Firehose:

resource "aws_iam_role" "firehose_role" {
  name = "firehose_test_role"

  assume_role_policy = jsonencode({
    "Version": "2012-10-17",
    "Statement": [
      {
        "Action": "sts:AssumeRole",
        "Principal": {
          "Service": "firehose.amazonaws.com"
        },
        "Effect": "Allow",
        "Sid": ""
      }
    ]
  })
}

The above grants Kinesis Firehose access to assume this role. As before, this role also needs an access policy to grant users of the role access to the target S3 bucket:

resource "aws_iam_role_policy" "firehose_role" {
  role = aws_iam_role.firehose_role.name

  policy = jsonencode({
    "Statement": [
      {
        "Effect": "Allow",
        "Action": ["s3:*"],
        "Resource": [aws_s3_bucket.bucket.arn]
      },
      {
        "Effect": "Allow",
        "Action": ["es:ESHttpGet"],
        "Resource": ["${aws_elasticsearch_domain.es.arn}/*"]
      },
      {
        "Effect": "Allow",
        "Action": [
            "logs:PutLogEvents"
        ],
        "Resource": [
            "arn:aws:logs:*:*:log-group:*:log-stream:*"
        ]
      },
    ],
  })
}

The above policy allows Kinesis Firehose to perform any action on the created S3 bucket, any action on the created ElasticSearch domain, and to write log events into any log stream in Cloudwatch Logs. The final part of this is not strictly necessary, but is important if logging is enabled for the Firehose Delivery Stream, or else Kinesis Firehose is unable to write logs back to Cloudwatch Logs.

Again, this is more access than strictly necessary. For more information on the specific actions supported, see the following references:

Since this single role has access to write to both S3 and to ElasticSearch, it can be specified for both of these delivery configurations in the Kinesis Firehose delivery stream:

resource "aws_kinesis_firehose_delivery_stream" "test_stream" {
  name        = "terraform-kinesis-firehose-test-stream"
  destination = "elasticsearch"

  s3_configuration {
    role_arn           = aws_iam_role.firehose_role.arn
    bucket_arn         = aws_s3_bucket.bucket.arn
    buffer_size        = 10
    buffer_interval    = 400
    compression_format = "GZIP"
  }

  elasticsearch_configuration {
    domain_arn = aws_elasticsearch_domain.es.arn
    role_arn   = aws_iam_role.firehose_role.arn
    index_name = "test"
    type_name  = "test"
  }

  # Wait until access has been granted before creating the firehose
  # delivery stream.
  depends_on = [aws_iam_role_policy.firehose_role]
}

With all of the above wiring complete, the services should have the access they need to connect the parts of this delivery pipeline.

This same general pattern applies to any connection between two AWS services. The important information needed for each case is:

  • The service name for the service that will initiate the requests, such as logs.us-east-1.amazonaws.com or firehose.amazonaws.com. These are unfortunately generally poorly documented and hard to find, but can usually be found in policy examples within each service's user guide.
  • The names of the actions that need to be granted. The full set of actions for each service can be found in AWS Service Actions and Condition Context Keys for Use in IAM Policies. Unfortunately again the documentation for specifically which actions are required for a given service-to-service integration is generally rather lacking, but in simple environments (notwithstanding any hard regulatory requirements or organizational policies around access) it usually suffices to grant access to all actions for a given service, using the wildcard syntax used in the above examples.
Martin Atkins
  • 62,420
  • 8
  • 120
  • 138
  • Thanks a lot for the detailed information Martin :) I will go through it and try it out and let you know how it goes. – Bond May 13 '17 at 04:50
  • I have all the code pieces put together and applied but my logs from cloudwatch are not getting delivered to ES for some reason. I ran through "test with demo data" as well. I am seeing this error: arn:aws:es:us-east-1:637892730498:domain/firehose-es-test The Lambda function was successfully invoked but it returned an error result. Lambda.FunctionError 2". I didn't have to put any lambda function myself. Not sure if I need to do one. – Bond May 15 '17 at 23:01
  • I'm also not sure where in the process Lambda is involved here. Where exactly did you see that error? i.e. was it from Terraform, or did you see it in the AWS console somewhere, or something else? – Martin Atkins May 15 '17 at 23:51
  • The error was shown in AWS console. I enabled S3 and ES logs on Kinesis to see why the data is not reaching ES. I am not sure how and where it is executing Lambda. My understanding is that if we setup the above code block, it should just stream the logs from the cloudwatch log group to ES via Kinesis Firehose. Am I missing something here? – Bond May 16 '17 at 03:22
  • After more troubleshooting, it looks like the actual error is : The data could not be decoded as UTF-8 as in {"attemptsMade":0,"arrivalTimestamp":1494910856979,"errorCode":"InvalidEncodingException","errorMessage":"The data could not be decoded as UTF-8","attemptEndingTimestamp":1494911157268,"rawData":"H4sIAAAAAAAAADWO0QqCMBiFX2XsWiJBQ70LUW8sIYUuQmLpnxvpJttMQnz3ZtrlxzmcQj0RXNBWNMkErmkSKoTfZSrmWQLoV1fBQlWS9ZoLHrNUgFQ5u5a8XvYHrBSfM6rWumdHQpDNjtuM7vr333IPnOtZfbxG4pmjTQ5tegEIK1YvxBlEgraZIPFFtlhgnyzOKmQQqFOzwXM5fj/HcTewAAAA=","esDocumentId":null,"esIndexName":null,"esTypeName":null} – Bond May 16 '17 at 05:24
  • Assuming that this error is coming out of the Kinesis Firehose logs, it sounds like all of the IAM authorization stuff is working correctly but the data being sent to ElasticSearch is not valid. A base64 decode of that rawData value yields a bunch of unintelligible binary data, whereas it looks like ES is expecting UTF-8 encoded text. This part is outside of my familiarity, unfortunately. It seems like this is now a different problem, so might be better to ask a fresh question so others with more Firehose/ES expertise are more likely to see it. – Martin Atkins May 16 '17 at 17:07
  • Sure thing Martin. Thanks for checking. – Bond May 16 '17 at 20:43
  • @Bond, you should give Martin the check. This is amazing information, its saving my life! – crthompson Mar 14 '20 at 15:17