We use Azure DevOps Server 2020 (on-prem).
Rationale
Our PR build takes a long time to build. With msbuild /m:4
flag it takes about 45 minutes to build from clean. The pipeline is configured to clean the outputs and run git clean before the build for the purpose of my tests.
(The actual PR build only cleans the outputs, but does not clean the repository)
I am trying to figure out the optimal combination of msbuild maxCpuCount
parameter and the number of concurrent PR builds we can run on the same build server (i.e. the number of build agents).
I have a dedicated build server that no one is using and configured 6 agents on it. This is my Subject Under Test. Next I have a driver build running on another build server which queues the builds on the SUT using REST API.
I know to queue a PR build with an arbitrary maxCpuCount
value. The question is - how to queue concurrent builds? On the surface - just execute the same REST API and the next available agent on the SUT pool would pick it up and run. But, the driver build also outputs some progress information. It actually does not queue one build, but N builds one after another while waiting for the previous build to finish (using REST API). It cleans up after failed builds and restarts them, etc... The driver build output is used to troubleshoot that same driver build (which is code and thus subject to bugs).
And so I dropped the options of running multiple Powershell jobs or PS runspaces or threads, because making sense of their combined output would be very problematic.
Instead I opted to define multiple jobs within the driver build itself. So, if 5 driver build jobs run at the same time, that means I am running 5 PR builds concurrently on the SUT.
Finally, why driver build in the first place? Why not long running console script? Several reasons:
- My desktop machine is often rebooted at night
- The server machine has 15 minutes no activity logout. I did not want to install a mouse jigger tool - not sure I am allowed.
- Arranging the driver code as Windows Service or Scheduled Task does not provide the same level of convenience to monitor its progress as Azure DevOps.
The question
There are two dimensions to my test:
- Max Degree of Parallelism or MaxDOP - maps to the msbuild
maxCpuCount
parameter - Agent Count - the number of the SUT build agents engaged in the test
Due to the way I designed it (explained above) the Agent Count defines the number of the concurrently running jobs within the driver build.
I want to implement the following test matrix:
MaxDOP X AgentCount = { 4,3,2,1 } X { 3, 4, 5, 6 }
So for example, if MaxDOP = 3 and AgentCount = 5
, then I will queue 5 PR builds on the SUT with /m:3
. Having 6 agents in total means these 5 PR builds would run concurrently.
And I want to do it for all the combinations of 4 >= MaxDOP >= 1
and 3 <= AgentCount <= 6
So I came up with the following driver build:
parameters:
- name: ctx
type: object
default:
- agentCount: 3
maxDOP: 4
dependsOn: Prepare
- agentCount: 3
maxDOP: 3
dependsOn: MaxDOP_4x3_Agents
- agentCount: 3
maxDOP: 2
dependsOn: MaxDOP_3x3_Agents
- agentCount: 3
maxDOP: 1
dependsOn: MaxDOP_2x3_Agents
- agentCount: 4
maxDOP: 4
dependsOn: MaxDOP_1x3_Agents
- agentCount: 4
maxDOP: 3
dependsOn: MaxDOP_4x4_Agents
- agentCount: 4
maxDOP: 2
dependsOn: MaxDOP_3x4_Agents
- agentCount: 4
maxDOP: 1
dependsOn: MaxDOP_2x4_Agents
- agentCount: 5
maxDOP: 4
dependsOn: MaxDOP_1x4_Agents
- agentCount: 5
maxDOP: 3
dependsOn: MaxDOP_4x5_Agents
- agentCount: 5
maxDOP: 2
dependsOn: MaxDOP_3x5_Agents
- agentCount: 5
maxDOP: 1
dependsOn: MaxDOP_2x5_Agents
- agentCount: 6
maxDOP: 4
dependsOn: MaxDOP_1x5_Agents
- agentCount: 6
maxDOP: 3
dependsOn: MaxDOP_4x6_Agents
- agentCount: 6
maxDOP: 2
dependsOn: MaxDOP_3x6_Agents
- agentCount: 6
maxDOP: 1
dependsOn: MaxDOP_2x6_Agents
jobs:
- job: Prepare
steps:
- Some preparation work
- ${{ each ctx in parameters.ctx }}:
- job: MaxDOP_${{ ctx.maxDOP }}x${{ ctx.agentCount }}_Agents
variables:
AgentCount: ${{ ctx.agentCount }}
strategy:
parallel: ${{ ctx.agentCount }}
timeoutInMinutes: 60000
dependsOn: ${{ ctx.dependsOn }}
steps:
- Queue the PR build on the SUT using ${{ ctx.maxDOP }} for maxCpuCount. The parallel strategy takes care to scale it ${{ ctx.agentCount }} times
- job: Cleanup
dependsOn:
- MaxDOP_1x6_Agents
condition: always()
steps:
- Some cleanup work
The challenge is to synchronize the jobs. This is because I do not want jobs with different MaxDOP
values to run concurrently. For example, take the job MaxDOP_3x4_Agents
:
- MaxDOP = 3
- AgentCount = 4
- Must run after MaxDOP_4x4_Agents
And of course:
- the first job
MaxDOP_4x3_Agents
must run after thePrepare
job - the
Cleanup
job must run after the last jobMaxDOP_1x6_Agents
I failed to find a way to express this semantics without coding explicitly all the 16 test matrix cells. Ignoring the dependsOn
requirement, 2 nested for
loops do the job very easily:
parameters:
- name: agentCounts
type: object
default: [3, 4, 5, 6]
- name: maxDOPs
type: object
default: [4, 3, 2, 1]
...
- ${{ each agentCount in parameters.agentCounts }}:
- ${{ each maxDOP in parameters.maxDOPs }}:
- job: MaxDOP_${{ maxDOP }}x${{ agentCount }}_Agents
variables:
AgentCount: ${{ agentCount }}
strategy:
parallel: ${{ agentCount }}
timeoutInMinutes: 60000
dependsOn: ????
steps:
...
Alas, I do not know how to specify dependsOn
here.
Some snapshots of the driver build (ignore the durations, it is currently dry runs, no actual PR builds on SUT are invoked):
Using a small test matrix: