1

Given an input, I have a cheap function and an expensive function; each of these is modeled as a Concourse task.

If two invocations of the cheap function have the same output, I know that two invocations of the expensive function will likewise have the same output.

How can I set up a pipeline that only runs the expensive function when the result of the cheap function changes?


For the sake of an example, let's say that the cheap function strips comments and whitespace from a codebase and then calculates a checksum; whereas the expensive function actually runs the code contained. My goal, in this scenario, is to not bother building any revision that differs from the prior one only in comments or whitespace.

I've considered using a git resource and (in our example) storing a hash of preprocessor output for each compilation target in a different file, so the task doing actual compilation (and applicable unit tests) can trigger on changes to the file with hash of the inputs that went into building that file. Having a separate git resource that maintains historical hashes indefinitely seems like overkill, though. Is there a better approach?


This is similar to Have Concourse only build new docker containers on file diff not on commit, but I'm trying to test whether the result of running a function against a file changes, to trigger only on changes that could modify build results rather than all possible changes. (The proposal described above, creating an intermediary repo with outputs from the cheap function, would effectively be using the answers to that question as one of its components; but I'm hoping there's an option with fewer moving parts).

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441

1 Answers1

1

Consider using put nested in the try: modifier:

conditional trigger

The cheap job takes two inputs:

  • git repo with the code
  • hash of the last cheap computation

On every commit to code-repo, the cheap job reads the last-hash input, mapped from hash and compares it to the computation result (in the silly example below, the contents of hash.txt checked into the root of code-repo).

If it determines that the hash value from incoming commit differs from the previously recorded hash value, it populates the put param hash/hash.txt with the new hash value, which results in a new put to the resource which in turn will trigger the expensive job.

If no change is detected, the put attempt will fail because the put param will not exist, but the overall cheap job will succeed.

resources:
  - name: code-repo
    type: git
    source:
      branch: master
      private_key: ((key))
      uri: git@github.com:myorg/code-repo.git

  - name: hash
    type: s3
    source:
      access_key_id: ((aws_access))
      secret_access_key: ((aws_secret))
      region_name: ((aws_region))
      bucket: my-versioned-aws-bucket
      versioned_file: hash/hash.txt

jobs:
  - name: cheap
    plan:
    - get: code-repo
      trigger: true
    - get: hash
    - task: check
      input_mapping:
        last-hash: hash
      config:
        platform: linux
        image_resource:
          type: docker-image
          source: { repository: alpine }
        inputs:
          - name: code-repo
          - name: last-hash
        outputs:
          - name: hash
        run:
          path: /bin/sh
          args:
          - -c
          - |
            LAST="$(cat last-hash/hash.txt)"
            NEW=$(cat code-repo/hash.txt)
            if [ "$LAST" != "$NEW" ]; then
              cp code-repo/hash.txt hash/hash.txt
            fi
      on_success:
        try:
          put: hash
          params:
            file: hash/hash.txt

  - name: expensive
    plan:
    - get: hash
      trigger: true
      passed: [ cheap ]

Note: you must populate the initial state file in s3 with some value, or the cheap job won't take off.