1

I have three Azure Pipeline agents built on Ubuntu 18.04 images and deployed to a Kubernetes cluster. Agents are running the latest version, 2.182.1, but this problem also happened using 2.181.0.

Executing build pipelines individually works just fine. Build completes successfully every time. But whenever a second pipeline starts while another pipeline is already running, it fails - every time - on the "Checkout" job with the following error:

The working folder U:\azp\agent\_work\1\s is already in use by the workspace ws_1_34;Project Collection Build Service (myaccount) on computer linux-agent-deployment-78bfb76d.

These are three separate and distinct agents running as separate containers. Why would a job from one container be impacting a job running on a different container? Concurrent builds work all day long on my non-container Windows servers.

The container agents are deployed as a standard Kubernetes "deployment" object:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: linux-agent
  name: linux-agent-deployment
  namespace: pipelines
  annotations:
    kubernetes.io/change-cause: "update agent image to 20210304 - change from OpenJDK to Oracle Java JDK 11"
spec:
  replicas: 3
  revisionHistoryLimit: 3
  selector:
    matchLabels:
      app: linux-agent
  strategy:
    rollingUpdate:
      maxUnavailable: 1
  template:
    metadata:
      labels:
        app: linux-agent
    spec:
      serviceAccountName: sa-aws-azp-pipelineagent
      containers:
        - name: linux-agent
          image: 999999999999.dkr.ecr.us-east-2.amazonaws.com/mgmt/my-linux-agent:20210304
          imagePullPolicy: IfNotPresent
          env:
            - name: AZP_URL
              value: https://dev.azure.com/myaccount
            - name: AZP_POOL
              value: EKS-Linux
            - name: AZP_TOKEN
              valueFrom:
                secretKeyRef:
                  name: azure-devops
                  key: agent-token

My build agent containers are pretty straightforward...

FROM ubuntu:18.04

ENV ACCEPT_EULA=y
ENV DEBIAN_FRONTEND=noninteractive
RUN echo "APT::Get::Assume-Yes \"true\";" > /etc/apt/apt.conf.d/90assumeyes
RUN ln -fs /usr/share/zoneinfo/America/Chicago /etc/localtime

RUN apt-get update \
    && apt-get install -y --no-install-recommends \
        apt-transport-https \
        ca-certificates \
        curl \
        jq \
        git \
        iputils-ping \
        libcurl4 \
        libicu60 \
        libunwind8 \
        netcat \
        dnsutils \
        wget \
        zip \
        unzip \
        telnet \
        ftp \
        file \
        time \
        tzdata \
        build-essential \
        libc6 \
        libgcc1 \
        libgssapi-krb5-2 \
        liblttng-ust0 \
        libssl1.0 \
        libstdc++6 \
        zlib1g \
        apt-utils \
        bison \
        brotli \
        bzip2 \
        dbus \
        dpkg \
        fakeroot \
        flex \
        gnupg2 \
        iproute2 \
        lib32z1 \
        libc++-dev \
        libc++abi-dev \
        libgbm-dev \
        libgconf-2-4 \
        libgtk-3-0 \
        libsecret-1-dev \
        libsqlite3-dev \
        libxkbfile-dev \
        libxss1 \
        locales \
        m4 \
        openssh-client \
        parallel \
        patchelf \
        pkg-config \
        rpm \
        rsync \
        shellcheck \
        sqlite3 \
        ssh \
        sudo \
        texinfo \
        tk \
        upx \
        xorriso \
        xvfb \
        xz-utils \
        zstd \
        zsync \
        software-properties-common

### REQUIRED APPLICATIONS

# Amazon Web Services - CLI
RUN curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" \
    && unzip awscliv2.zip \
    && sudo ./aws/install

# MS SQL Tools  (ONE-TIME SETUP OF MICROSOFT REPOSITORY INCLUDED)
RUN curl https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add - \
    && curl https://packages.microsoft.com/config/ubuntu/18.04/prod.list | sudo tee /etc/apt/sources.list.d/msprod.list \
    && sudo apt-get update && sudo ACCEPT_EULA=Y apt-get install -y mssql-tools unixodbc-dev

# Powershell Global Tool (https://learn.microsoft.com/en-us/powershell/scripting/install/installing-powershell-core-on-linux?view=powershell-7.1)
RUN sudo apt-get install -y powershell

# .NET Core SDKs (https://learn.microsoft.com/en-us/dotnet/core/install/linux-ubuntu)
#       see also (https://packages.microsoft.com/ubuntu/18.04/prod/dists/bionic/main/binary-amd64/) "Packages"
#       SDKs Included: 2.1, 2.2, 3.0, 3.1, 5.0
RUN sudo apt-get install -y dotnet-host \
    aspnetcore-store-2.0.0 \
    aspnetcore-store-2.0.3 \
    aspnetcore-store-2.0.5 \
    aspnetcore-store-2.0.6 \
    aspnetcore-store-2.0.7 \
    aspnetcore-store-2.0.8 \
    aspnetcore-store-2.0.9 \
    dotnet-hostfxr-2.0.7 \
    dotnet-hostfxr-2.0.9 \
    dotnet-hostfxr-2.1 \
    dotnet-hostfxr-2.2 \
    dotnet-hostfxr-3.0 \
    dotnet-hostfxr-3.1 \
    dotnet-hostfxr-5.0 \
    dotnet-runtime-deps-2.1 \
    dotnet-runtime-deps-2.2 \
    dotnet-runtime-deps-3.0 \
    dotnet-runtime-deps-3.1 \
    dotnet-runtime-deps-5.0 \
    dotnet-targeting-pack-3.0 \
    dotnet-targeting-pack-3.1 \
    dotnet-targeting-pack-5.0 \
    netstandard-targeting-pack-2.1 \
    aspnetcore-targeting-pack-3.0 \
    aspnetcore-targeting-pack-3.1 \
    aspnetcore-targeting-pack-5.0 \
    dotnet-runtime-2.1 \
    dotnet-runtime-2.2 \
    dotnet-runtime-3.0 \
    dotnet-runtime-3.1 \
    dotnet-runtime-5.0 \
    aspnetcore-runtime-2.1 \
    aspnetcore-runtime-2.2 \
    aspnetcore-runtime-3.0 \
    aspnetcore-runtime-3.1 \
    aspnetcore-runtime-5.0 \
    dotnet-sdk-2.1 \
    dotnet-sdk-2.2 \
    dotnet-sdk-3.0 \
    dotnet-sdk-3.1 \
    dotnet-sdk-5.0

# Initialize dotnet
RUN dotnet help
RUN dotnet --info

# Node.js (https://github.com/nodesource/distributions/blob/master/README.md)
RUN curl -sL https://deb.nodesource.com/setup_14.x | sudo -E bash - \
    && sudo apt-get install -y nodejs \
    && node --version \
    && npm --version

# Java JDK 11
COPY JDK/ /var/cache/oracle-jdk11-installer-local/
RUN add-apt-repository -y ppa:linuxuprising/java && \
    apt-get update && \
    echo oracle-java11-installer shared/accepted-oracle-license-v1-2 select true | sudo /usr/bin/debconf-set-selections && \
    apt-get install -y oracle-java11-installer-local

ENV JAVA_HOME=/usr/lib/jvm/java-11-oracle \
    JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF8

# Clean package cache
RUN rm -rf /var/lib/apt/lists/* \
    && rm -rf /etc/apt/sources.list.d/*

WORKDIR /azp

COPY ./start.sh .
RUN chmod +x start.sh

CMD ["./start.sh"]

What am I doing wrong?

Bryan
  • 258
  • 3
  • 11
  • 1
    That's a Windows path in your error message. That said, if you have a Kubernetes pod with multiple agent containers running on the same machine, you'll need to ensure that they're truly running in an independent space, and not somehow pointed to a common network or host folder. – WaitingForGuacamole Mar 04 '21 at 20:39
  • 1
    Yes, Azure DevOps has always shows my paths that way -- as far as I can tell it's just a quirk of their platform. The containers are definitely linux, however. I updated the post to show the actual Dockerfile and Kubernetes manifest contents I'm deploying. The containers are isolated - no mapped volumes or shared storage of any kind. – Bryan Mar 04 '21 at 21:19
  • The only thing I can think of is that somehow a single container is managing to run multiple instances of the AZP agent (which is not good unless they specify different work directories at startup). Can you show your `start.sh` script, please? – WaitingForGuacamole Mar 05 '21 at 13:36
  • I confirmed there was only one process running for each agent. My start.sh script is the one provided by Microsoft here: https://learn.microsoft.com/en-us/azure/devops/pipelines/agents/docker?view=azure-devops. I did finally find a solution to this, however. The k8s manifest should have been a StatefulSet rather than Deployment. – Bryan Mar 05 '21 at 18:03

1 Answers1

1

Solution has been found. Here's how I resolved this for anyone coming across this post:

I discovered a helm chart for Azure Pipeline agents - emberstack/docker-azure-pipelines-agent - and after poking around in the contents, discovered what was staring me in the face the last couple of days: "StatefulSets"

Simple, easy to test, and working well so far. I refactored my k8s manifest as a StatefulSet object and the agents are up and able to run builds concurrently. Still more testing to do, but looking very positive at this point.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app: linux-agent
  name: linux-pipeline-agent
  namespace: pipelines
  annotations:
    kubernetes.io/change-cause: "Init 20210304 - Oracle Java JDK 11"
spec:
  podManagementPolicy: Parallel
  replicas: 3
  revisionHistoryLimit: 3
  selector:
    matchLabels:
      app: linux-agent
  serviceName: agent-svc
  updateStrategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: linux-agent
    spec:
      serviceAccountName: sa-aws-azp-pipelineagent
      containers:
        - name: linux-agent
          image: 999999999999.dkr.ecr.us-east-2.amazonaws.com/mgmt/my-linux-agent:20210304
          imagePullPolicy: IfNotPresent
          env:
            - name: AZP_AGENT_NAME
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.name
            - name: AZP_URL
              value: https://dev.azure.com/myaccount
            - name: AZP_POOL
              value: EKS-Linux
            - name: AZP_TOKEN
              valueFrom:
                secretKeyRef:
                  name: azure-devops
                  key: agent-token

Bryan
  • 258
  • 3
  • 11
  • So, first and foremost, I'm glad you have a working cluster. However, a StatefulSet does not explain why your Pod had concurrency failures. I've done StatefulSets for agents before, and the biggest benefit to them IMO is the ability to have volumes that survive reboots, so that you can take advantage of cached resources. With Pods, the only way your containers could have concurrency violations is if they were sharing a volume. SMH on this one. Glad it's working, don't understand why. – WaitingForGuacamole Mar 05 '21 at 18:23
  • Also, since we're on the topic of StatefulSets for agents. Again, great idea to keep your agents warm when scaling up! The only problem I ran into with k8s in general was how to scale agent pools - it's not really about CPU or memory consumption, it's about how many agents you need to maintain a desired build queue service rate. As a result, I went away from k8s completely and built a VM scale set agent pool in Azure (I know, you're in Amazon), but that allowed the pool to control the agent count based on busy-ness. – WaitingForGuacamole Mar 05 '21 at 18:29
  • 1
    I'm still trying to figure this out myself. But the best I can tell is that the individuality / stickiness guaranteed by k8s for statefulsets is the key factor here. I'm not using any persistent volumes here - and thats what stumped me initially. However, the fact remains that by virtue of the error I was getting using the deployment configuration it was a big flag that the agent is stateful. There must be something else under the hood that having the uniqueness offered by statefulsets is doing in this case - specifically something having to do with how storage works even without PVs. – Bryan Mar 05 '21 at 19:36