1

I'm trying to run a remote build execution with bazel buildfarm memory workers on our k8s cluster.

I've set up the server pods, worker pods, and redis clusters, as buildfarm's architecture requires it, along with k8s services and ingresses to allow me to remotely send builds.

However, when I tried to execute it, I got the following:

eito@fuji:~/MyRepo$ bazel --client_debug run //tools:ipython3 --config=rbe
[INFO 11:03:07.374 src/main/cpp/option_processor.cc:407] Looking for the following rc files: /etc/bazel.bazelrc,/home/eito/MyRepo/.bazelrc,/home/eito/.bazelrc
[INFO 11:03:07.374 src/main/cpp/rc_file.cc:56] Parsing the RcFile /home/eito/MyRepo/.bazelrc
[INFO 11:03:07.374 src/main/cpp/rc_file.cc:56] Parsing the RcFile user.bazelrc
[INFO 11:03:07.374 src/main/cpp/rc_file.cc:129] Skipped optional import of user.bazelrc, the specified rc file either does not exist or is not readable.
[INFO 11:03:07.374 src/main/cpp/rc_file.cc:56] Parsing the RcFile /home/eito/.bazelrc
[INFO 11:03:07.374 src/main/cpp/blaze.cc:1626] Debug logging requested, sending all client log statements to stderr
[INFO 11:03:07.374 src/main/cpp/blaze.cc:1509] Acquired the client lock, waited 0 milliseconds
[INFO 11:03:07.377 src/main/cpp/blaze.cc:1697] Trying to connect to server (timeout: 30 secs)...
[INFO 11:03:07.385 src/main/cpp/blaze.cc:1264] Connected (server pid=113490).
[INFO 11:03:07.385 src/main/cpp/blaze.cc:1974] Releasing client lock, let the server manage concurrent requests.
INFO: Invocation ID: c97091ec-e335-4882-8107-c9084d4453ff
ERROR: Failed to query remote execution capabilities: connection timed out: buildfarm.dev.azr.internal.mydomain.com/172.33.33.99:8980
[INFO 11:03:37.613 src/main/cpp/blaze.cc:2093] failure_detail: message: "Failed to query remote execution capabilities: connection timed out: buildfarm.dev.azr.internal.mydomain.com/172.33.33.99:8980"
remote_execution {
  code: CAPABILITIES_QUERY_FAILURE
}

My worker deployment & service looks like (server very similar, just different images and different configmap mounted):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: aks-buildfarm-worker
  namespace: infrastructure--buildfarm
spec:
  replicas: 1
  selector:
    matchLabels:
      app: aks-buildfarm
  template:
    metadata:
      labels:
        app: aks-buildfarm
        role: app
    spec:
      containers:
      - name: buildfarm-worker
        image: mydomain.azurecr.io/buildfarm-memory-worker:v8
        volumeMounts:
          - mountPath: "/config"
            name: buildfarm-worker-config
        ports:
        - containerPort: 8980
          protocol: TCP
        resources:
          limits:
            memory: 256Mi
            cpu: "300m"
      volumes:
      - name: buildfarm-worker-config
        configMap:
          name: buildfarm-worker-config
---
apiVersion: v1
kind: Service
metadata:
  name: aks-buildfarm
  namespace: infrastructure--buildfarm
spec:
  type: ClusterIP
  ports:
    - protocol: TCP
      name: grpc
      port: 8980
      targetPort: 8980
  selector:
    app: aks-buildfarm


I'm mostly using the following configs deployed as configmaps on k8s: https://github.com/bazelbuild/bazel-buildfarm/blob/main/examples/shard-server.config.example https://github.com/bazelbuild/bazel-buildfarm/blob/main/examples/worker.config.example

The only difference being that I specify all localhost:8980 in the worker config to "aks-buildfarm-server.infrastructure--buildfarm.svc.cluster.local", since they are within the same k8s cluster, and can communicate through that.

My ingress is like the following:

apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  namespace: infrastructure--buildfarm
  name: buildfarm-ingress
  annotations:
    kubernetes.io/ingress.class: nginx-internal
    nginx.ingress.kubernetes.io/rewrite-target: /$1
    nginx.ingress.kubernetes.io/use-regex: "true"
    cert-manager.io/cluster-issuer: selfsigned-cluster-issuer
spec:
  rules:
  - host: buildfarm.dev.azr.internal.mydomain.com
    http:
      paths:
      - backend:
          serviceName: aks-buildfarm
          servicePort: 8980
        path: /(.*)

My .bazelrc file looks like below:

build:rbe --remote_executor=grpcs://buildfarm.dev.azr.internal.mydomain.com:8980

Ken White
  • 123,280
  • 14
  • 225
  • 444
M80
  • 191
  • 1
  • 14

1 Answers1

1

You need to use the shard worker config from here: https://github.com/bazelbuild/bazel-buildfarm/blob/main/examples/shard-worker.config.example You will also need a running redis instance or cluster as the 2-way communication between the server and the worker is triggered via redis

Tom Zayouna
  • 111
  • 1
  • 5