Before going for a more sophisticated automation approach (Terraform and/or Helm chart) I am trying to get a dev AWS EKS environment working with this guide: https://aws-otel.github.io/docs/introduction
These steps go fine:
kubectl apply -f https://amazon-eks.s3.amazonaws.com/docs/addons-otel-permissions.yaml
eksctl create iamserviceaccount \
--name adot-collector \
--namespace opentelemetry-operator-system \
--cluster <MY-CLUSTER> \
--attach-policy-arn arn:aws:iam::aws:policy/AmazonPrometheusRemoteWriteAccess \
--attach-policy-arn arn:aws:iam::aws:policy/AWSXrayWriteOnlyAccess \
--attach-policy-arn arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy \
--approve \
--override-existing-serviceaccounts
The next part of the guide gets a little confusing because it states that you can either do this:
aws eks create-addon --addon-name adot --cluster-name <your_cluster_name>
or, if you wish to pass in a more customized configuration, do this:
aws eks create-addon \
--cluster-name <YOUR-EKS-CLUSTER-NAME> \
--addon-name adot \
--configuration-values file://configuration-values.json \
--resolve-conflicts=OVERWRITE
My goal is is to create the Collector using the "statefulset" mode, but no matter what I try in the configuration-values.json file, it is never creating anything for the Collector --no statefulset, no pods. The operator pod is the only thing that gets created and nothing I can make sense of in the operator log --looks like standard stuff.
This is the configuration-values.json file I am trying:
{
"replicaCount": 1,
"manager": {
"resources": {
"limits": {
"cpu": "200m",
"memory": "256Mi"
},
"requests": {
"cpu": "100m",
"memory": "128Mi"
}
}
},
"kubeRBACProxy": {
"resources": {
"limits": {
"cpu": "50m",
"memory": "64Mi"
},
"requests": {
"cpu": "10m",
"memory": "32Mi"
}
}
},
"collector": {
"mode": "statefulset",
"serviceAccount": {
"create": false,
"name": "adot-collector"
},
"resources": {
"limits": {
"cpu": "1",
"memory": "2Gi"
},
"requests": {
"cpu": "500m",
"memory": "1Gi"
}
}
}
}
I am confused as to what the issue might be? The aws eks create-addon
actually completes successfully but there are never any Collector pods or statefulset. Could this be a lack of resources in my EKS Cluster (it's a smaller, 3-node dev cluster)?
I am adding logs from the operator:
- no collector pods:
❯ k get pods -n opentelemetry-operator-system
NAME READY STATUS RESTARTS AGE
opentelemetry-operator-79b9f86654-ntt9p 2/2 Running 0 3m16s
- operator logs:
I0814 21:11:50.958866 1 leaderelection.go:255] successfully acquired lease opentelemetry-operator-system/9f7554c3.opentelemetry.io
{"level":"info","ts":"2023-08-14T21:11:50Z","logger":"instrumentation-upgrade","msg":"looking for managed Instrumentation instances to upgrade"}
{"level":"info","ts":"2023-08-14T21:11:50Z","logger":"collector-upgrade","msg":"looking for managed instances to upgrade"}
{"level":"info","ts":"2023-08-14T21:11:50Z","msg":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1alpha1.OpenTelemetryCollector"}
{"level":"info","ts":"2023-08-14T21:11:50Z","msg":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.ConfigMap"}
{"level":"info","ts":"2023-08-14T21:11:50Z","msg":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.ServiceAccount"}
{"level":"info","ts":"2023-08-14T21:11:50Z","msg":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.Service"}
{"level":"info","ts":"2023-08-14T21:11:50Z","msg":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.Deployment"}
{"level":"info","ts":"2023-08-14T21:11:50Z","msg":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.DaemonSet"}
{"level":"info","ts":"2023-08-14T21:11:50Z","msg":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.StatefulSet"}
{"level":"info","ts":"2023-08-14T21:11:50Z","msg":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v2.HorizontalPodAutoscaler"}
{"level":"info","ts":"2023-08-14T21:11:50Z","msg":"Starting Controller","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector"}
{"level":"info","ts":"2023-08-14T21:11:51Z","logger":"collector-upgrade","msg":"no instances to upgrade"}
{"level":"info","ts":"2023-08-14T21:11:51Z","logger":"instrumentation-upgrade","msg":"no instances to upgrade"}
{"level":"info","ts":"2023-08-14T21:11:51Z","msg":"Starting workers","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","worker count":1}
logs when trying "deployment" mode for contoller (default):
k get pods -n opentelemetry-operator-system
NAME READY STATUS RESTARTS AGE
opentelemetry-operator-79b9f86654-lqnjd 2/2 Running 0 79s
❯ k logs opentelemetry-operator-79b9f86654-lqnjd -n opentelemetry-operator-system
{"level":"info","ts":"2023-08-14T21:29:42Z","msg":"Starting the OpenTelemetry Operator","opentelemetry-operator":"0.76.1-adot-46-g803a86e","opentelemetry-collector":"public.ecr.aws/aws-observability/aws-otel-collector:v0.30.0","opentelemetry-targetallocator":"public.ecr.aws/aws-observability/adot-operator-targetallocator:0.78.1","operator-opamp-bridge":"public.ecr.aws/aws-observability/adot-operator-opamp-bridge:0.78.0","auto-instrumentation-java":"public.ecr.aws/aws-observability/adot-autoinstrumentation-java:1.27.0","auto-instrumentation-nodejs":"public.ecr.aws/aws-observability/adot-operator-autoinstrumentation-nodejs:0.39.1","auto-instrumentation-python":"public.ecr.aws/aws-observability/adot-operator-autoinstrumentation-python:0.39b0","auto-instrumentation-dotnet":"public.ecr.aws/aws-observability/adot-operator-autoinstrumentation-dotnet:0.7.0","auto-instrumentation-go":"ghcr.io/open-telemetry/opentelemetry-go-instrumentation/autoinstrumentation-go:v0.2.1-alpha","auto-instrumentation-apache-httpd":"ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-apache-httpd:1.0.2","feature-gates":"operator.autoinstrumentation.apache-httpd,operator.autoinstrumentation.dotnet,-operator.autoinstrumentation.go,operator.autoinstrumentation.java,operator.autoinstrumentation.nodejs,operator.autoinstrumentation.python,-operator.collector.rewritetargetallocator","build-date":"2023-06-15T16:35:10Z","go-version":"go1.20.5","go-arch":"amd64","go-os":"linux","labels-filter":[]}
{"level":"info","ts":"2023-08-14T21:29:42Z","logger":"setup","msg":"the env var WATCH_NAMESPACE isn't set, watching all namespaces"}
{"level":"info","ts":"2023-08-14T21:29:42Z","logger":"controller-runtime.metrics","msg":"Metrics server is starting to listen","addr":"0.0.0.0:8080"}
{"level":"info","ts":"2023-08-14T21:29:42Z","logger":"controller-runtime.builder","msg":"Registering a mutating webhook","GVK":"opentelemetry.io/v1alpha1, Kind=OpenTelemetryCollector","path":"/mutate-opentelemetry-io-v1alpha1-opentelemetrycollector"}
{"level":"info","ts":"2023-08-14T21:29:42Z","logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/mutate-opentelemetry-io-v1alpha1-opentelemetrycollector"}
{"level":"info","ts":"2023-08-14T21:29:42Z","logger":"controller-runtime.builder","msg":"Registering a validating webhook","GVK":"opentelemetry.io/v1alpha1, Kind=OpenTelemetryCollector","path":"/validate-opentelemetry-io-v1alpha1-opentelemetrycollector"}
{"level":"info","ts":"2023-08-14T21:29:42Z","logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/validate-opentelemetry-io-v1alpha1-opentelemetrycollector"}
{"level":"info","ts":"2023-08-14T21:29:42Z","logger":"controller-runtime.builder","msg":"Registering a mutating webhook","GVK":"opentelemetry.io/v1alpha1, Kind=Instrumentation","path":"/mutate-opentelemetry-io-v1alpha1-instrumentation"}
{"level":"info","ts":"2023-08-14T21:29:42Z","logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/mutate-opentelemetry-io-v1alpha1-instrumentation"}
{"level":"info","ts":"2023-08-14T21:29:42Z","logger":"controller-runtime.builder","msg":"Registering a validating webhook","GVK":"opentelemetry.io/v1alpha1, Kind=Instrumentation","path":"/validate-opentelemetry-io-v1alpha1-instrumentation"}
{"level":"info","ts":"2023-08-14T21:29:42Z","logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/validate-opentelemetry-io-v1alpha1-instrumentation"}
{"level":"info","ts":"2023-08-14T21:29:42Z","logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/mutate-v1-pod"}
{"level":"info","ts":"2023-08-14T21:29:42Z","logger":"setup","msg":"starting manager"}
{"level":"info","ts":"2023-08-14T21:29:42Z","msg":"Starting server","kind":"health probe","addr":"[::]:8081"}
{"level":"info","ts":"2023-08-14T21:29:42Z","logger":"controller-runtime.webhook.webhooks","msg":"Starting webhook server"}
{"level":"info","ts":"2023-08-14T21:29:42Z","msg":"starting server","path":"/metrics","kind":"metrics","addr":"[::]:8080"}
I0814 21:29:42.639882 1 leaderelection.go:245] attempting to acquire leader lease opentelemetry-operator-system/9f7554c3.opentelemetry.io...
{"level":"info","ts":"2023-08-14T21:29:42Z","logger":"controller-runtime.certwatcher","msg":"Updated current TLS certificate"}
{"level":"info","ts":"2023-08-14T21:29:42Z","logger":"controller-runtime.webhook","msg":"Serving webhook server","host":"","port":9443}
{"level":"info","ts":"2023-08-14T21:29:42Z","logger":"controller-runtime.certwatcher","msg":"Starting certificate watcher"}
I0814 21:29:42.648681 1 leaderelection.go:255] successfully acquired lease opentelemetry-operator-system/9f7554c3.opentelemetry.io
{"level":"info","ts":"2023-08-14T21:29:42Z","logger":"instrumentation-upgrade","msg":"looking for managed Instrumentation instances to upgrade"}
{"level":"info","ts":"2023-08-14T21:29:42Z","logger":"collector-upgrade","msg":"looking for managed instances to upgrade"}
{"level":"info","ts":"2023-08-14T21:29:42Z","msg":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1alpha1.OpenTelemetryCollector"}
{"level":"info","ts":"2023-08-14T21:29:42Z","msg":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.ConfigMap"}
{"level":"info","ts":"2023-08-14T21:29:42Z","msg":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.ServiceAccount"}
{"level":"info","ts":"2023-08-14T21:29:42Z","msg":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.Service"}
{"level":"info","ts":"2023-08-14T21:29:42Z","msg":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.Deployment"}
{"level":"info","ts":"2023-08-14T21:29:42Z","msg":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.DaemonSet"}
{"level":"info","ts":"2023-08-14T21:29:42Z","msg":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.StatefulSet"}
{"level":"info","ts":"2023-08-14T21:29:42Z","msg":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v2.HorizontalPodAutoscaler"}
{"level":"info","ts":"2023-08-14T21:29:42Z","msg":"Starting Controller","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector"}
{"level":"info","ts":"2023-08-14T21:29:42Z","logger":"instrumentation-upgrade","msg":"no instances to upgrade"}
{"level":"info","ts":"2023-08-14T21:29:42Z","logger":"collector-upgrade","msg":"no instances to upgrade"}
{"level":"info","ts":"2023-08-14T21:29:42Z","msg":"Starting workers","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","worker count":1}
Still no controller:
❯ k get deployments -n opentelemetry-operator-system
NAME READY UP-TO-DATE AVAILABLE AGE
opentelemetry-operator 1/1 1 1 4m21s