Can a CDK pipeline stack avoid referring to a specific repo and Github connection?

Question

My CDK pipeline stack has this code:

const pipeline = new CodePipeline(this, id, {
    pipelineName: id,
    synth: new CodeBuildStep("Synth", {
            input: CodePipelineSource.connection("user/example4-be", "main", {
                connectionArn: "arn:aws:codestar-connections:us-east-1:111...1111:connection/1111-1111.....1111",
            }),
            installCommands: [],
            commands: []
        }
    ),
})

which makes the code tightly coupled to the repository it is in (user/example4-be) and the Github connection it's using to access it (arn:aws:codestar-connections:...). This would break if someone forks the repo and wants to have a parallel pipeline. I feel like these two values should be configuration and not part of the code.

Is there a way using CDK and CodePipeline for this to be external variables? I guess the variables should be per-pipeline if possible? I'm not entirely sure.

What do you mean by "external variables"? Where would you like to define them? — gshpychka, Jan 25 '22 at 10:59
You can store the values in System Manager Parameter store and read it in CDK. Docs: https://docs.aws.amazon.com/cdk/v2/guide/get_ssm_value.html I had used this approach [here](https://stackoverflow.com/a/70737033/17896613) — Kaustubh Khavnekar, Jan 25 '22 at 17:29
@gshpychka: no idea, maybe somewhere in the CodePipeline in the AWS Console? — Pablo Fernandez, Jan 25 '22 at 22:44

fedonev · Accepted Answer · 2022-01-27T11:30:33.143

Subclass Stack and accept the source configuration input as a custom prop type.¹

// SourceConfigPipelineStack.ts

interface SourceConfigPipelineStackProps extends cdk.StackProps {
  source: pipelines.CodePipelineSource;
}

export class SourceConfigPipelineStack extends cdk.Stack {
  constructor(
    scope: Construct,
    id: string,
    props: SourceConfigPipelineStackProps
  ) {
    super(scope, id);

    const pipeline = new pipelines.CodePipeline(this, id, {
      pipelineName: id,
      synth: new pipelines.CodeBuildStep('Synth', {
        input: props.source,
        installCommands: [],
        commands: [],
      }),
    });
  }
}

Pipeline consumers then pass their own source as configuration:

// app.ts

new SourceConfigPipelineStack(app, 'MyPipelineStack', {
  env,
  source: pipelines.CodePipelineSource.connection('user/example4-be', 'main', {
    connectionArn:
      'arn:aws:codestar-connections:us-east-1:111...1111:connection/1111-1111.....1111',
  }),
});

Edit: Is it "bad" to put ARN configuration in code?

Not according to AWS. The CDK "best practices" doc says it's reasonable to hardcode cross-stack ARNs:

When the two stacks are in different AWS CDK apps, use a static from method to import an externally-defined resource based on its ARN ... (for example, Table.fromArn() for a DynamoDB table). Use the CfnOutput construct to print the ARN or other required value in the output of cdk deploy, or look in the AWS console. Or the second app can parse the CloudFormation template generated by the first app and retrieve that value from the Outputs section.

Hardcoding ARNs in code is sometimes worse, sometimes better than the alternatives like Parameter, Secret or CfnOutput.

Edit: Handle multi-environment config with a Configuration Factory

All Apps have app-level config items (e.g. defaultInstanceSize), which often differ by environment. Prod accounts need full-powered resources, dev accounts don't. Consider encapsulating (non-secret) config in a Configuration Factory. The constructor receives an account and region and returns plaintext configuration object. Stacks receive the config as props.

// app.ts

const { env, isProd, retainOnDelete, enableDynamoCache, defaultInstanceSize, repoName, branchName, githubConnectionArn } =
  // the config factory is using the account and region from the --profile flag
  new EnvConfigurator('SuperApp', process.env.CDK_DEFAULT_ACCOUNT, process.env.CDK_DEFAULT_REGION).config;

new SourceConfigPipelineStack(app, 'MyPipelineStack', {
  env,
  source: pipelines.CodePipelineSource.connection(repoName, branchName, {
    connectionArn: githubConnectionArn
  }),
  stackTerminationProtection: isProd,
});

The local config pattern has several advantages:

Config values are easily discoverable and centralised in a single place
Callers can be allowed to provide type-constrained overrides
Easily assert against configuration values
Config values are under version control
Pipeline-friendly: avoid cross-account permission headaches

Local config can be used alongside Parameter and CfnOutput and Secret, which have complimentary advantages. Apps typically use each one. Reasonable people can disagree about where exactly to draw the boundaries.

(1) The fundamental CDK pattern is Construct composition: "Composition is the key pattern for defining higher-level abstractions through constructs... In general, composition is preferred over inheritance when developing AWS CDK constructs." In this case, it makes sense to subclass Stack rather than the Construct base class, because the OP use case is a cloned repo with, presumably, the deploy stages non-optionally encapsulated in the stack.

But `app.ts` is still in the repo. If someone forks the repo for example, and `cdk deploys` to a different AWS account (which would be true if each developer had their own, which is best practice, isn't it), all those values would be incorrect. They would need to be changed and committed, making that repo unmergeable. — Pablo Fernandez, Jan 25 '22 at 22:46
@pupeno Your comment suggests you have a team-development scenario in mind: _How can each team member have her own dev-env pipeline instance, without interfering with the repo's test-prod env configurations?_ This is a common requirement. Is this the context of your question? — fedonev, Jan 26 '22 at 08:25
Sort of. To me hardcoding these values in the code feels bad, and there are two scenarios in which I can point to where it is a problem: a team, or an open source application deployed by different people. But I'm not trying to achieve those things. I'm developing on my own, I just don't like painting myself in the corner with bad practices. — Pablo Fernandez, Jan 26 '22 at 09:05
@pupeno Hardcoded ARNs "bad"? [AWS guidance says otherwise](https://docs.aws.amazon.com/cdk/v2/guide/best-practices.html#best-practices-apps): _"use a static `from` method to import an externally-defined resource based on its ARN... use .. the ARN [from the] output of `cdk deploy` or look in the AWS console"_. — fedonev, Jan 27 '22 at 10:01
@pupeno No solution is _always_ "good" or "bad". One can cherry-pick a scenario for which _literally anything_ is "bad". That's why [SO emphasizes](https://stackoverflow.com/help/dont-ask) "reasonably scoped", "answerable" OPs. — fedonev, Jan 27 '22 at 10:10

score 2 · Answer 2 · answered Jan 26 '22 at 17:40

If you want to keep this information out of the repo, you can create SSM parameters in a separate stack, deploy it and populate the parameters, then do a synth-time lookup in the pipeline.

Here's how it would look in python:


class ParametersStack(cdk.Stack):
    def __init__(self, scope: cdk.Construct, construct_id: str, **kwargs):
        super().__init__(scope, construct_id, **kwargs)
        
        codestar_connection = csc.CfnConnection(
            self, "my_connection", connection_name="my_connection", provider_type="GitHub"
        )

        ssm.StringParameter(
            self,
            "codestar_arn",
            string_value=codestar_connection.ref,
            parameter_name="/codestar/connection_arn",
        )
        
        ssm.StringParameter(
            self,
            "repo_owner",
            string_value="REPO_OWNER",
            parameter_name="/github/repo_owner",
        )
        ssm.StringParameter(
            self,
            "main_repo_name",
            string_value="MAIN_REPO_NAME",
            parameter_name="/github/repo_name",
        )

You'd then deploy this stack, set up the connection, and populate the repo owner and name parameters.

In the pipeline stack:


github_repo_owner = ssm.StringParameter.value_from_lookup(
    self, "/github/repo_owner"
)
github_repo_name = ssm.StringParameter.value_from_lookup(
    self, "/github/repo_name"
)

# The following is needed because during the first synth, the values will be # filled with dummy values that are incompatible, so just replace them with # dummy values that will synth
# See https://github.com/aws/aws-cdk/issues/8699

if "dummy" in github_repo_owner:
    github_repo_owner = "dummy"
if "dummy" in github_repo_name:
    github_repo_name = "dummy"

repo_string = f"{github_repo_owner}/{github_repo_name}"

codestar_connection_arn = ssm.StringParameter.value_from_lookup(
    self, "/codestar/connection_arn"
)

source = pipelines.CodePipelineSource.connection(
    repo_string=repo_string,
    branch=branch_name,
    connection_arn=codestar_connection_arn,
)

You also need to give the pipeline the right to perform the lookups during synth. You do this by allowing the role for the synth action to assume the lookup role

synth_step = pipelines.CodeBuildStep(
    "synth",
    install_commands=[
        "npm install -g aws-cdk",
        "pip install -r requirements.txt",
    ],
    commands=[
        "cdk synth",
    ],
    input=source,
    role_policy_statements=[
        iam.PolicyStatement(
            effect=iam.Effect.ALLOW,
            actions=["sts:AssumeRole"],
            resources=["*"],
            conditions={
                "StringEquals": {
                    "iam:ResourceTag/aws-cdk:bootstrap-role": "lookup"
                }
            },
        ),
    ],
)

The looked up values will be saved in cdk.context.json. If you don't commit it to your VCS, the pipeline will do the lookup and fetch the actual values every time.

Can a CDK pipeline stack avoid referring to a specific repo and Github connection?

2 Answers2

Edit: Is it "bad" to put ARN configuration in code?

Edit: Handle multi-environment config with a Configuration Factory