I'm trying to run a remote build execution with bazel buildfarm memory workers on our k8s cluster.
I've set up the server pods, worker pods, and redis clusters, as buildfarm's architecture requires it, along with k8s services and ingresses to allow me to remotely send builds.
However, when I tried to execute it, I got the following:
eito@fuji:~/MyRepo$ bazel --client_debug run //tools:ipython3 --config=rbe
[INFO 11:03:07.374 src/main/cpp/option_processor.cc:407] Looking for the following rc files: /etc/bazel.bazelrc,/home/eito/MyRepo/.bazelrc,/home/eito/.bazelrc
[INFO 11:03:07.374 src/main/cpp/rc_file.cc:56] Parsing the RcFile /home/eito/MyRepo/.bazelrc
[INFO 11:03:07.374 src/main/cpp/rc_file.cc:56] Parsing the RcFile user.bazelrc
[INFO 11:03:07.374 src/main/cpp/rc_file.cc:129] Skipped optional import of user.bazelrc, the specified rc file either does not exist or is not readable.
[INFO 11:03:07.374 src/main/cpp/rc_file.cc:56] Parsing the RcFile /home/eito/.bazelrc
[INFO 11:03:07.374 src/main/cpp/blaze.cc:1626] Debug logging requested, sending all client log statements to stderr
[INFO 11:03:07.374 src/main/cpp/blaze.cc:1509] Acquired the client lock, waited 0 milliseconds
[INFO 11:03:07.377 src/main/cpp/blaze.cc:1697] Trying to connect to server (timeout: 30 secs)...
[INFO 11:03:07.385 src/main/cpp/blaze.cc:1264] Connected (server pid=113490).
[INFO 11:03:07.385 src/main/cpp/blaze.cc:1974] Releasing client lock, let the server manage concurrent requests.
INFO: Invocation ID: c97091ec-e335-4882-8107-c9084d4453ff
ERROR: Failed to query remote execution capabilities: connection timed out: buildfarm.dev.azr.internal.mydomain.com/172.33.33.99:8980
[INFO 11:03:37.613 src/main/cpp/blaze.cc:2093] failure_detail: message: "Failed to query remote execution capabilities: connection timed out: buildfarm.dev.azr.internal.mydomain.com/172.33.33.99:8980"
remote_execution {
code: CAPABILITIES_QUERY_FAILURE
}
My worker deployment & service looks like (server very similar, just different images and different configmap mounted):
apiVersion: apps/v1
kind: Deployment
metadata:
name: aks-buildfarm-worker
namespace: infrastructure--buildfarm
spec:
replicas: 1
selector:
matchLabels:
app: aks-buildfarm
template:
metadata:
labels:
app: aks-buildfarm
role: app
spec:
containers:
- name: buildfarm-worker
image: mydomain.azurecr.io/buildfarm-memory-worker:v8
volumeMounts:
- mountPath: "/config"
name: buildfarm-worker-config
ports:
- containerPort: 8980
protocol: TCP
resources:
limits:
memory: 256Mi
cpu: "300m"
volumes:
- name: buildfarm-worker-config
configMap:
name: buildfarm-worker-config
---
apiVersion: v1
kind: Service
metadata:
name: aks-buildfarm
namespace: infrastructure--buildfarm
spec:
type: ClusterIP
ports:
- protocol: TCP
name: grpc
port: 8980
targetPort: 8980
selector:
app: aks-buildfarm
I'm mostly using the following configs deployed as configmaps on k8s: https://github.com/bazelbuild/bazel-buildfarm/blob/main/examples/shard-server.config.example https://github.com/bazelbuild/bazel-buildfarm/blob/main/examples/worker.config.example
The only difference being that I specify all localhost:8980
in the worker config to "aks-buildfarm-server.infrastructure--buildfarm.svc.cluster.local"
, since they are within the same k8s cluster, and can communicate through that.
My ingress is like the following:
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
namespace: infrastructure--buildfarm
name: buildfarm-ingress
annotations:
kubernetes.io/ingress.class: nginx-internal
nginx.ingress.kubernetes.io/rewrite-target: /$1
nginx.ingress.kubernetes.io/use-regex: "true"
cert-manager.io/cluster-issuer: selfsigned-cluster-issuer
spec:
rules:
- host: buildfarm.dev.azr.internal.mydomain.com
http:
paths:
- backend:
serviceName: aks-buildfarm
servicePort: 8980
path: /(.*)
My .bazelrc
file looks like below:
build:rbe --remote_executor=grpcs://buildfarm.dev.azr.internal.mydomain.com:8980