0

I'm trying to get data from an on-premise SQL Server 2016 Enterprise Edition instance into the cloud. I have hit a roadblock, so if anyone has any guidance as to a workaround, I'd really appreciate you sharing your knowledge!

I'm planning on using AWS Database Migration Service (aws.amazon.com), which I'm going to call 'DMS' for this post. The database must remain on-premise for regulatory reasons, so I have a need to continually capture data from this database and ship it to the cloud. I'm going to use Change Data Capture (learn.microsoft.com) for this aspect.

This use case is explicitly called out in the DMS docs, so it seems like the appropriate tool. In addition, I see from this 2018 blog post that Kinesis Data Streams are a valid target for DMS. That's great; I want to use Kinesis to process the data from CDC downstream.

The problem is that in the Terraform docs for DMS targets (terraform.io) don't give Kinesis as an endpoint option type. Here's an issue on the Terraform github project (github.com) where someone else has noticed the same thing. And an associated PR (github.com), which looks like it should provide a fix. Although it seems to depend on another fix, so I'm not holding my breath.

Now, some some specific questions:

  1. In the thread below the github issue, someone mentions using a mixture of Cloudformation and Terraform. Some quick searching throws up aws_cloudformation_stack (terraform.io) as a means to achieve this. Is that correct?
  2. Should I in fact hold my breath for Hashicorp to merge in the DMS fixes?
  3. Are there any other ways through this problem that I haven't thought of?
Oscar Barlow
  • 100
  • 7
  • My suggestion, don't waste time on DMS at all, it never works. Manually migration is the way to go. – BMW Dec 04 '19 at 05:52

1 Answers1

0

Not sure of the SO etiquette around answering your own question, (Meta is a little unclear (meta.stackexchange.com))[https://meta.stackexchange.com/questions/17845/etiquette-for-answering-your-own-question] so here goes:

Connecting DMS and Kinesis Data Streams with Terraform + CloudFormation

aws_cloudformation_stack does work. The CloudFormation is actually relatively simple. The first code block below shows my Terraform. I'm using Terraform 0.12's templatefile function to interpolate parameters into the CloudFormation JSON. Where you see <Xxx> or similar, those are placeholders for conventional identifiers for my environment that I'd rather not share.

resource "aws_cloudformation_stack" "kinesis_target" {
  name = "xxx-xxx-kinesis-target"

  template_body = templatefile("kinesis-target-stack.json.tpl",
    {
      identifier_alphanumeric = "<Xxx><Xxx>KinesisTarget",
      identifier              = "<xxx>-<xxx>-kinesis-target",
      access_role_arn         = aws_iam_role.dms_data_stream_role.arn,
      stream_arn              = "${data.terraform_remote_state.data_stream.outputs.arn}",
      output_alphanumeric     = var.kinesis_dms_target_arn_output_key
    }
  )
}

One further thing to note, output_alphanumeric is the name of the key with which I'm associating the ARN of the component created by this stack. It turns out not to be entirely straightforward to use this value elsewhere in your Terraform code. See (Data Source: aws_cloudformation_stack (terraform.io))[https://www.terraform.io/docs/providers/aws/d/cloudformation_stack.html].

The complication this introduces that you'll have to explicitly state dependencies in your code, as otherwise the output from the CloudFormation stack won't be available when Terraform tries to interpolate that value into other bits of your codebase. Or you'll have to roll out your changes piecemeal, commenting and uncommenting resources to satisfy the dependencies manually. Gross.

The CloudFormation JSON is fairly simple:

{
  "Resources": {
    "${identifier_alphanumeric}": {
      "Type": "AWS::DMS::Endpoint",
      "Properties": {
        "EndpointIdentifier": "${identifier}",
        "EndpointType": "target",
        "EngineName": "kinesis",
        "KinesisSettings": {
          "MessageFormat": "json",
          "ServiceAccessRoleArn": "${access_role_arn}",
          "StreamArn": "${stream_arn}"
        }
      }
    }
  },
  "Outputs": {
    "${output_alphanumeric}": {
      "Description": "ARN of DMS Target for Kinesis Data Stream",
      "Value": { "Ref" : "${identifier_alphanumeric}" },
      "Export": {
        "Name": "${output_alphanumeric}"
      }
    }
  }
}

The thing to note about this is that although the (AWS::DMS::Endpoint docs (docs.aws.amazon.com))[https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-dms-endpoint.html] claim that an EngineName property is mandatory and it must only be one of the values from the list provided (which does not include kinesis), this is not the case.

I've left EngineName in there for clarity and in the expectation that some day kinesis will be added to the list and perhaps the validation rules will be tightened. And in fact, I haven't tried to deploy this infra without EngineName present. Nonetheless, there were no problems rolling this out.

So, that answers my question (1).

I haven't tried expressing this in Terraform, which would be a lot neater and would obviate the need to explicitly state dependencies between resources. I'm not sure if Terraform does engine_name validation beyond what CloudFormation does; if it does, the CloudFormation approach is probably the best for the time being. If not, I'd way rather rewrite the above as pure Terraform. In the meantime, what I have works.

Which also kind of answers my question (2). I don't need to hold my breath for these fixes to be merged.

Still open to input on (3)! I haven't got end to end validation that DMS works as expected, yet, as I'm blocked by a networking issue. A commenter has expressed skepticism about DMS which is a little concerning...

I'll report back when I've got something more to say on that!

Oscar Barlow
  • 100
  • 7