How to synchronize sequentially different jobs in a single YAML build?

Question

We use Azure DevOps Server 2020 (on-prem).

Rationale

Our PR build takes a long time to build. With msbuild /m:4 flag it takes about 45 minutes to build from clean. The pipeline is configured to clean the outputs and run git clean before the build for the purpose of my tests.

(The actual PR build only cleans the outputs, but does not clean the repository)

I am trying to figure out the optimal combination of msbuild maxCpuCount parameter and the number of concurrent PR builds we can run on the same build server (i.e. the number of build agents).

I have a dedicated build server that no one is using and configured 6 agents on it. This is my Subject Under Test. Next I have a driver build running on another build server which queues the builds on the SUT using REST API.

I know to queue a PR build with an arbitrary maxCpuCount value. The question is - how to queue concurrent builds? On the surface - just execute the same REST API and the next available agent on the SUT pool would pick it up and run. But, the driver build also outputs some progress information. It actually does not queue one build, but N builds one after another while waiting for the previous build to finish (using REST API). It cleans up after failed builds and restarts them, etc... The driver build output is used to troubleshoot that same driver build (which is code and thus subject to bugs).

And so I dropped the options of running multiple Powershell jobs or PS runspaces or threads, because making sense of their combined output would be very problematic.

Instead I opted to define multiple jobs within the driver build itself. So, if 5 driver build jobs run at the same time, that means I am running 5 PR builds concurrently on the SUT.

Finally, why driver build in the first place? Why not long running console script? Several reasons:

My desktop machine is often rebooted at night
The server machine has 15 minutes no activity logout. I did not want to install a mouse jigger tool - not sure I am allowed.
Arranging the driver code as Windows Service or Scheduled Task does not provide the same level of convenience to monitor its progress as Azure DevOps.

The question

There are two dimensions to my test:

Max Degree of Parallelism or MaxDOP - maps to the msbuild maxCpuCount parameter
Agent Count - the number of the SUT build agents engaged in the test

Due to the way I designed it (explained above) the Agent Count defines the number of the concurrently running jobs within the driver build.

I want to implement the following test matrix:

MaxDOP X AgentCount = { 4,3,2,1 } X { 3, 4, 5, 6 }

So for example, if MaxDOP = 3 and AgentCount = 5, then I will queue 5 PR builds on the SUT with /m:3. Having 6 agents in total means these 5 PR builds would run concurrently.

And I want to do it for all the combinations of 4 >= MaxDOP >= 1 and 3 <= AgentCount <= 6

So I came up with the following driver build:

parameters:
  - name: ctx
    type: object
    default:
      - agentCount: 3
        maxDOP: 4
        dependsOn: Prepare
      - agentCount: 3
        maxDOP: 3
        dependsOn: MaxDOP_4x3_Agents
      - agentCount: 3
        maxDOP: 2
        dependsOn: MaxDOP_3x3_Agents
      - agentCount: 3
        maxDOP: 1
        dependsOn: MaxDOP_2x3_Agents
      - agentCount: 4
        maxDOP: 4
        dependsOn: MaxDOP_1x3_Agents
      - agentCount: 4
        maxDOP: 3
        dependsOn: MaxDOP_4x4_Agents
      - agentCount: 4
        maxDOP: 2
        dependsOn: MaxDOP_3x4_Agents
      - agentCount: 4
        maxDOP: 1
        dependsOn: MaxDOP_2x4_Agents
      - agentCount: 5
        maxDOP: 4
        dependsOn: MaxDOP_1x4_Agents
      - agentCount: 5
        maxDOP: 3
        dependsOn: MaxDOP_4x5_Agents
      - agentCount: 5
        maxDOP: 2
        dependsOn: MaxDOP_3x5_Agents
      - agentCount: 5
        maxDOP: 1
        dependsOn: MaxDOP_2x5_Agents
      - agentCount: 6
        maxDOP: 4
        dependsOn: MaxDOP_1x5_Agents
      - agentCount: 6
        maxDOP: 3
        dependsOn: MaxDOP_4x6_Agents
      - agentCount: 6
        maxDOP: 2
        dependsOn: MaxDOP_3x6_Agents
      - agentCount: 6
        maxDOP: 1
        dependsOn: MaxDOP_2x6_Agents

jobs:
 - job: Prepare
    steps:
      - Some preparation work

 - ${{ each ctx in parameters.ctx }}:
      - job: MaxDOP_${{ ctx.maxDOP }}x${{ ctx.agentCount }}_Agents
        variables:
          AgentCount: ${{ ctx.agentCount }}
        strategy:
          parallel: ${{ ctx.agentCount }}
        timeoutInMinutes: 60000
        dependsOn: ${{ ctx.dependsOn }}
        steps:
          - Queue the PR build on the SUT using ${{ ctx.maxDOP }} for maxCpuCount. The parallel strategy takes care to scale it ${{ ctx.agentCount }} times

 - job: Cleanup
    dependsOn:
      - MaxDOP_1x6_Agents
    condition: always()
    steps:
      - Some cleanup work

The challenge is to synchronize the jobs. This is because I do not want jobs with different MaxDOP values to run concurrently. For example, take the job MaxDOP_3x4_Agents:

MaxDOP = 3
AgentCount = 4
Must run after MaxDOP_4x4_Agents

And of course:

the first job MaxDOP_4x3_Agents must run after the Prepare job
the Cleanup job must run after the last job MaxDOP_1x6_Agents

I failed to find a way to express this semantics without coding explicitly all the 16 test matrix cells. Ignoring the dependsOn requirement, 2 nested for loops do the job very easily:

parameters:
  - name: agentCounts
    type: object
    default: [3, 4, 5, 6]
  - name: maxDOPs
    type: object
    default: [4, 3, 2, 1]

...

  - ${{ each agentCount in parameters.agentCounts }}:
      - ${{ each maxDOP in parameters.maxDOPs }}:
          - job: MaxDOP_${{ maxDOP }}x${{ agentCount }}_Agents
            variables:
              AgentCount: ${{ agentCount }}
            strategy:
              parallel: ${{ agentCount }}
            timeoutInMinutes: 60000
            dependsOn: ????
            steps:
...

Alas, I do not know how to specify dependsOn here.

Some snapshots of the driver build (ignore the durations, it is currently dry runs, no actual PR builds on SUT are invoked):

Using a small test matrix:

What if you floated the `MaxDOP` out of the strategy and iterate the `job` with it? So, you'd get a new job for each `MaxDOP` setting, each job would depend on the previous, and then your strategy would cover the agent counts? — WaitingForGuacamole, Nov 29 '21 at 14:15
@WaitingForGuacamole - isn't it what I am already doing? If you have a concrete idea would it be possible for you to arrange your comment as an answer and provide a code snippet? — mark, Nov 29 '21 at 15:11

How to synchronize sequentially different jobs in a single YAML build?

0 Answers0