0

I really hope you can help me with a matter I am struggling for quite some time.

Istio Version

client version: 1.14.1
control plane version: 1.14.1
data plane version: 1.14.1 (130 proxies)

Kubectl Version

Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.4", GitCommit:"872a965c6c6526caa949f0c6ac028ef7aff3fb78", GitTreeState:"clean", BuildDate:"2022-11-09T13:36:36Z", GoVersion:"go1.19.3", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.17+rke2r1", GitCommit:"953be8927218ec8067e1af2641e540238ffd7576", GitTreeState:"clean", BuildDate:"2023-02-28T21:40:04Z", GoVersion:"go1.19.6 X:boringcrypto", Compiler:"gc", Platform:"linux/amd64"}

I have Istio and inject sidecar containers throughout my namespace.

There are various components which work out of the box using mTLS connection.

Istio is installed either by official helm-chart repo, IstioOperator with istioctl or taken from Kubeflow helm-chart in Github.

Everything works when it comes to few core components that are doing checks via initContainer over HTTP to a Kubernetes service and once they are completed the system starts.

initContainer
- command:
        - sh
        - -c
        - until curl -X GET "elasticsearch:9200/_cluster/health?wait_for_status=yellow&timeout=50s&pretty";
          do echo waiting for elasticsearch; sleep 2; done

When I enforce strict policy authentication everything collapses. I am getting hit by connection failures. The below snippet is an example for that.

curl: (56) Recv failure: Connection reset by peer waiting for elasticsearch

I can also see similar behavior for Database instances.

"- - -" 0 NR filter_chain_not_found - "-" 0 0 0 - "-" "-" "-" "-" "-" - -

As far as I read Istio’s doc the above issue is related to missing Destination Rule.

The same goes for the Jobs which are trying to install something on a Pod.

command: [sh, -c, sleep 10, "curl -XPUT -H 'Content-Type: application/json' -T /mnt/license/license.bin 'http://elasticsearch:9200/_siren/license'"]

The license just doesn’t apply on the pod and thus Elasticsearch is failing.

Another example if I try to do a curl to Kubernetes service from istio-proxy on any pod

istio-proxy@index-upload-ab6055a0aa6c3477b41a-np8k9:/$ curl -v -k http://index-es-http:9200
*   Trying 10.43.121.43:9200...
* TCP_NODELAY set
* Connected to index-es-http (10.43.121.43) port 9200 (#0)
> GET / HTTP/1.1
> Host: index-es-http:9200
> User-Agent: curl/7.68.0
> Accept: */*
>
* Recv failure: Connection reset by peer
* Closing connection 0
curl: (56) Recv failure: Connection reset by peer

istio-proxy@index-es-mdi-0:/$ curl http://k8s-service:9091 curl: (56) Recv failure: Connection reset by peer I have also decided to use cert-manager for CA. That didn’t solve the matter either.

Here are my PeerAuthentication policy, Destination Rule and Authorization Policy all placed within istio-system namespace

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: mtls
  namespace: istio-system
spec:
  host: '*.cluster.local'
  trafficPolicy:
    tls:
      mode: ISTIO_MUTUAL
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: mtls-policy
  namespace: istio-system
spec:
  mtls:
    mode: STRICT
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: allow-all
  namespace: istio-system
spec:
  rules:
  - {}

I have also tried creating Gateways to server everything on port 80, tried cluster-local-gateway as a component.

What am I missing here? Should every service that is failing has it’s own VirtualService? But than again why I cannot run a curl to ANY of the pods inside the mesh when STRICT policy is enabled?

The strict policy is also affecting my internal ingress controllers. They cannot server calls. I am either getting a 502 bad gateway or nothing at all.

Since I am kind of new to this one, I am sure that the information might be insufficient, so hit me with whatever necessary.

  • Did you verify that the sidecar containers were actually spun up and running? Sometimes you need to rollout restart your deployments for this to happen. might be the same when you change auth policies. So I'd first recommend trying a rollout restart of all deployments. – MrE Jun 07 '23 at 02:25
  • No, that's not the case, too easy. Policies may take some time to take effect, but most of the times is automatically. I have tried restarts many times, that's not the matter – diDaster Jun 07 '23 at 06:52
  • Kindly check application pod access log (istio-proxy) after running "curl http://k8s-service:9091". kindly run curl command from application container instead of istio-proxy container. Access log will give clue why exactly the request are failing. – Nataraj Medayhal Jun 07 '23 at 13:02

0 Answers0