Describe the issue
I'm using the fluent-operator to deploy fluentdbit to collect logs and fluentd to process and send to an OpenSearch domain with advanced security configuration.
It works with open domains, but not with secured ones.
I noticed the Operator creates a Service Account for Fluentbit and Fluentd by default. I then proceeded to attach an IAM Role for Service Account(IRSA) to Fluentd's Service Account with the following inlinePolicy
:
apiVersion: auth.XXX.XXX.com/v1
kind: IRSA
metadata:
name: fluent-test
namespace: fluent-system
annotations:
auth.XXX.XXX.com/serviceaccount: managed
spec:
serviceAccount: fluentd
nameOverride: fluent-test
path: /XXXX/
inlinePolicy: |
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "es:*",
"Resource": [
"arn:aws:es:*:XXXX:domain/*"
]
}
]
}
But the Fluentd pod still can't communicate with the specified domain:
The client is unable to verify distribution due to security privileges on the server side. Some functionality may not be compatible if the server is running an unsupported product.
2023-03-15 09:27:50 +0000 [warn]: #0 [ClusterFluentdConfig-cluster-fluentd-config::cluster::clusteroutput::fluentd-output-opensearch-0] Could not communicate to OpenSearch, resetting connection and trying again. [401]
2023-03-15 09:27:50 +0000 [warn]: #0 [ClusterFluentdConfig-cluster-fluentd-config::cluster::clusteroutput::fluentd-output-opensearch-0] Remaining retry: 14. Retry to communicate after 2 second(s).
2023-03-15 09:27:54 +0000 [warn]: #0 [ClusterFluentdConfig-cluster-fluentd-config::cluster::clusteroutput::fluentd-output-opensearch-0] Could not communicate to OpenSearch, resetting connection and trying again. [401]
After applying the IRSA, its irsa-operator generates the equivalent role in AWS with the correct inlinePolicy and even mentions the OpenSearch Service as "Allowed Services". It also correctly attaches the IRSA to the fluentd service account in the EKS cluster.
I've also used the IAM Policy Simulator, which seems to indicate my role/policy is correct:
I'm starting to wonder if it's possible at all to use IRSA to give fluentd access to a secured OpenSearch domain...
I noticed that fluentbit output plugin for opensearch has some parameters to deal with authentication and IAM roles, but fluentd's doesn't.
Is it an unavoidable limitation? Has anyone ever used the fluent-operator in a fluentbit-fluentd mode with fluentd using IRSA to connect to AWS OpenSearch?
To Reproduce
- Provision an OpenSearch domain with advanced security options using this Terraform provider.
I used the following inputs:
dedicated_master_enabled = "true"
dedicated_master_count = "3"
dedicated_master_type = "r5.large.search"
automated_snapshot_start_hour = "0"
domain_name = "any-name"
engine_version = "OpenSearch_2.3"
instance_type = "m5.large.search"
instance_count = 3
subnet_ids = [...]
volume_size = 50
vpc_id = your-vpc-id
default_zone_awareness_config = false
zone_awareness_enabled = true
create_iam_service_linked_role = "false"
encryption_enabled = "true"
enforce_https = "true"
node_to_node_encryption_enabled = "true"
retention_in_days = 7
warm_instance_enabled = "true"
warm_instance_type = "ultrawarm1.medium.search"
warm_instance_count = 2
cold_storage_enabled = "true"
sg_egress_all_enabled = "true"
sg_ingress_443_enabled = "true"
sg_ingress_9200_enabled = "true"
# Custom Endpoint
#custom_endpoint_fqdn = "your-custom-endpoint-fqdn"
#custom_endpoint_certificate_arn = ...
# SSO
advanced_security_enabled = "true"
anonymous_auth_enabled = "false"
master_user_arn = "..."
saml_master_user_name = "..."
saml_master_backend_role = "..."
internal_user_database_enabled = "false"
## Okta Integration
saml_enabled = "true"
saml_entity_id = "http://www.okta.com/XXXX"
saml_metadata_content = file("./saml-metadata.xml")
# COGNITO
/*
cognito_options_enabled = "true"
cognito_user_pool_id = "us-west-2_xxxx"
cognito_identity_pool_id = "..."
cognito_role_arn = "XXX"
*/
- Install the fluent-operator, fluentbit and fluentd as instructed in the
How did you install fluent operator?
section below.
Expected behavior
- All fluent-operator, fluentbit and fluentd pods up and running.
- Fluentbit collecting logs and forwarding to fluentd.
- Fluentd shipping logs to a secured OpenSearch domain.
Your Environment
- Fluent Operator version: 2.0.1
- Container Runtime: Docker
- Operating system: Linux(Ubuntu)
- Kernel version: 5.4.0-135-generic
How did you install fluent operator?
I installed the operator via helm chart with fluentbit and fluentd disabled:
helm repo add fluent https://fluent.github.io/helm-charts
helm repo update
helm install fluent-operator fluent/fluent-operator --create-namespace -n fluent-system --version 2.0.2 --values values.yaml
My custom values.yaml had the following configuration:
containerRuntime: docker
Kubernetes: false
operator:
initcontainer:
repository: "docker"
tag: "20.10"
container:
repository: "kubesphere/fluent-operator"
tag: v2.0.1
resources:
limits:
cpu: 100m
memory: 60Mi
requests:
cpu: 100m
memory: 20Mi
logLevel: debug
fluentd:
enable: false
I then, applied fluentbit and fluentd manifests manually:
apiVersion: fluentbit.fluent.io/v1alpha2
kind: FluentBit
metadata:
labels:
app.kubernetes.io/name: fluent-bit
name: fluent-bit
namespace: fluent-system
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-role.kubernetes.io/edge
operator: DoesNotExist
fluentBitConfigName: fluent-bit-config
image: kubesphere/fluent-bit:v2.0.9
positionDB:
hostPath:
path: /var/lib/fluent-bit/
resources:
limits:
cpu: 500m
memory: 200Mi
requests:
cpu: 10m
memory: 25Mi
tolerations:
- operator: Exists
---
apiVersion: fluentbit.fluent.io/v1alpha2
kind: ClusterInput
metadata:
labels:
fluentbit.fluent.io/component: logging
fluentbit.fluent.io/enabled: "true"
name: docker
spec:
systemd:
db: /fluent-bit/tail/systemd.db
dbSync: Normal
path: /var/log/journal
systemdFilter:
- _SYSTEMD_UNIT=docker.service
- _SYSTEMD_UNIT=kubelet.service
tag: service.*
---
apiVersion: fluentbit.fluent.io/v1alpha2
kind: ClusterInput
metadata:
labels:
fluentbit.fluent.io/component: logging
fluentbit.fluent.io/enabled: "true"
name: tail
spec:
tail:
db: /fluent-bit/tail/pos.db
dbSync: Normal
memBufLimit: 5MB
parser: docker
path: /var/log/containers/*.log
refreshIntervalSeconds: 10
skipLongLines: true
tag: kube.*
---
apiVersion: fluentbit.fluent.io/v1alpha2
kind: ClusterFluentBitConfig
metadata:
labels:
app.kubernetes.io/name: fluent-bit
name: fluent-bit-config
spec:
filterSelector:
matchLabels:
fluentbit.fluent.io/enabled: "true"
inputSelector:
matchLabels:
fluentbit.fluent.io/enabled: "true"
outputSelector:
matchLabels:
fluentbit.fluent.io/enabled: "true"
service:
parsersFile: parsers.conf
---
apiVersion: fluentbit.fluent.io/v1alpha2
kind: ClusterFilter
metadata:
labels:
fluentbit.fluent.io/component: logging
fluentbit.fluent.io/enabled: "true"
name: kubernetes
spec:
filters:
- kubernetes:
annotations: true
kubeCAFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
kubeTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
kubeURL: https://kubernetes.default.svc:443
labels: true
- nest:
addPrefix: kubernetes_
nestedUnder: kubernetes
operation: lift
- modify:
rules:
- remove: stream
- remove: kubernetes_pod_id
- remove: kubernetes_host
- remove: kubernetes_container_hash
- nest:
nestUnder: kubernetes
operation: nest
removePrefix: kubernetes_
wildcard:
- kubernetes_*
match: kube.*
---
apiVersion: fluentbit.fluent.io/v1alpha2
kind: ClusterOutput
metadata:
labels:
fluentbit.fluent.io/component: logging
fluentbit.fluent.io/enabled: "true"
name: fluentd
spec:
forward:
host: fluentd.fluent-system.svc
port: 24224
matchRegex: (?:kube|service)\.(.*)
apiVersion: fluentd.fluent.io/v1alpha1
kind: Fluentd
metadata:
name: fluentd
namespace: fluent-system
labels:
app.kubernetes.io/name: fluentd
spec:
globalInputs:
- forward:
bind: 0.0.0.0
port: 24224
replicas: 1
image: kubesphere/fluentd:v1.15.3
resources:
limits:
cpu: 500m
memory: 500Mi
requests:
cpu: 100m
memory: 128Mi
fluentdCfgSelector:
matchLabels:
config.fluentd.fluent.io/enabled: "true"
---
apiVersion: fluentd.fluent.io/v1alpha1
kind: ClusterFluentdConfig
metadata:
labels:
config.fluentd.fluent.io/enabled: "true"
name: fluentd-config
spec:
clusterFilterSelector:
matchLabels:
filter.fluentd.fluent.io/enabled: "true"
clusterOutputSelector:
matchLabels:
output.fluentd.fluent.io/enabled: "true"
watchedNamespaces: # find an easier way to do this or open an issue
- kube-system
- fluent-system
- default
---
apiVersion: fluentd.fluent.io/v1alpha1
kind: ClusterOutput
metadata:
labels:
output.fluentd.fluent.io/enabled: "true"
name: fluentd-output-opensearch
spec:
outputs:
- opensearch:
host: vpc-XXX-us-XXX-XXXX-XXXX.us-XXX-XXX.es.amazonaws.com
logstashFormat: true
logstashPrefix: logs
port: 443
scheme: https
logLevel: debug # change to info after OpenSearchErrorHandler is fixed