I'm trying to install Eclipse Che on an existing Kubernetes cluster with two nodes using chectl. I have some of the applications that Eclipse Che requires already installed (and working) on my cluster; Postgres, Keycloak (as OIDC), and Cert Manager. I have provided a patch yaml to chectl as follows (secret, password, and auth URL changed);
apiVersion: org.eclipse.che/v2
spec:
networking:
domain: kubernetes.default.svc.cluster.local
annotations:
kubernetes.io/ingress.class: nginx
auth:
identityProviderURL: https://sso.mydomain.com
oAuthClientName: eclipse-che
oAuthSecret: BBz8ZfGzGXY0vUfeoKgP8vczFrj3D5KU
database:
externalDb: true
chePostgresHostname: 'postgres.postgres.svc.cluster.local'
chePostgresPort: '5432'
chePostgresUser: 'eclipse'
chePostgresPassword: '83xf5tC85LA9ZX4w'
chePostgresDb: 'eclipse'
I'm calling chectl with;
chectl server:deploy --platform=k8s --che-operator-cr-patch-yaml ~/che/patch.yaml --domain https://kubernetes.default.svc.cluster.local --skip-cert-manager --k8spodreadytimeout=500000 --k8spoderrorrechecktimeout=500000 --installer=operator --debug
The result is always the same (note this is not the first attempt, so some resources already exist);
› Current Kubernetes context: 'kubernetes-admin@kubernetes'
✔ Verify Kubernetes API...[1.26]
✔ Kubernetes preflight checklist
✔ Verify if kubectl is installed...[OK]
✔ Verify domain is set...[OK]
✔ Create Namespace eclipse-che...[Exists]
↓ Install Cert Manager v1.8.2 [skipped]
✔ Start following Eclipse Che installation logs...[OK]
❯ Deploy Eclipse Che operator
↓ Install Dev Workspace operator [skipped]
→ Dev Workspace operator already installed
✔ Create ServiceAccount che-operator...[Exists]
✔ Create RBAC
✔ Create Role che-operator-leader-election...[Exists]
✔ Create Role che-operator...[Exists]
✔ Create RoleBinding che-operator-leader-election...[Exists]
✔ Create RoleBinding che-operator...[Exists]
✔ Create RoleBinding eclipse-che-che-operator...[Exists]
✔ Create RoleBinding eclipse-che-che-operator...[Exists]
✔ Create Certificate che-operator-serving-cert...[Exists]
✔ Create Issuer che-operator-selfsigned-issuer...[Exists]
✔ Create Service che-operator-service...[Exists]
✔ Create CRD checlusters.org.eclipse.che...[Exists]
✔ Waiting...[OK]
✔ Create Deployment che-operator...[Exists]
❯ Eclipse Che Operator pod bootstrap
✔ Scheduling...[OK]
✔ Downloading images...[OK]
✖ Starting
→ Failed to start a pod, reason: Error, exitCode: 1
Create ValidatingWebhookConfiguration org.eclipse.che
Create MutatingWebhookConfiguration org.eclipse.che
Create CheCluster Custom Resource
Error: Command server:deploy failed with the error: Failed to start a pod, reason: Error, exitCode: 1 See details: /home/myuser/.cache/chectl/error.log. Eclipse Che logs:
/tmp/chectl-logs/1681617965184.
The outout of /home/myuser/.cache/chectl/error.log is;
2023-04-16T04:06:30.649Z Error: Command server:deploy failed with the error: Failed to start a pod, reason: Error, exitCode: 1 See details: /home/myuser/.cache/chectl/error.log. Eclipse Che logs: /tmp/chectl-logs/1681617965184.
2023-04-16T04:06:30.649Z at Object.newError (/usr/local/lib/chectl/lib/utils/utls.js:41:19)
2023-04-16T04:06:30.649Z at Object.wrapCommandError (/usr/local/lib/chectl/lib/utils/command-utils.js:53:19)
2023-04-16T04:06:30.649Z at Deploy.<anonymous> (/usr/local/lib/chectl/lib/commands/server/deploy.js:122:44)
2023-04-16T04:06:30.649Z at Generator.throw (<anonymous>)
2023-04-16T04:06:30.649Z at rejected (/usr/local/lib/chectl/node_modules/tslib/tslib.js:165:69)
2023-04-16T04:06:30.649Z at runMicrotasks (<anonymous>)
2023-04-16T04:06:30.649Z Cause: Error: Failed to start a pod, reason: Error, exitCode: 1
2023-04-16T04:06:30.649Z at /usr/local/lib/chectl/lib/tasks/pod-tasks.js:192:35
2023-04-16T04:06:30.649Z at Generator.next (<anonymous>)
2023-04-16T04:06:30.649Z at fulfilled (/usr/local/lib/chectl/node_modules/tslib/tslib.js:164:62)
2023-04-16T04:06:30.649Z at runMicrotasks (<anonymous>)
The output of the pod logs from /tmp/chectl-logs/1681617965184/eclipse-che/che-operator-69bfb7c98-thm9l/che-operator.log is (as a side note; adding --debug doesn't seem to change verbosity of the logging at all);
2023-04-16T04:01:13.725Z ERROR Unable determine installation platform {"error": "could not read API groups: Get \"https://10.96.0.1:443/api?timeout=32s\": dial tcp 10.96.0.1:443: i/o timeout"}
runtime.doInit
/usr/lib/golang/src/runtime/proc.go:6230
runtime.main
/usr/lib/golang/src/runtime/proc.go:233
There are two pods created by chectl, but they always go into CrashLoopBackOff;
devworkspace-controller devworkspace-controller-manager-685cb85df7-27cqc 1/2 CrashLoopBackOff 167 (4m49s ago) 12h
eclipse-che che-operator-69bfb7c98-dg9wc 0/1 CrashLoopBackOff 9 (3m3s ago) 29m
There doesn't seem to be any way to get transparency to what chectl is doing in order to manually step though and troubleshoot. Googling suggests that loads of people are having problems at the same point in the install, but there are no answers. I can only assume that they ultimately end up giving up and going for a different solution.
I'm at a loss as where to look next or how to troubleshoot this any further. Has anyone been able to get past this point in an Eclipse Che install?