4

I am currently doing research on the service mesh Istio in version 1.6. The data plane (Envoy proxies) are configured by the controle plane. Especially Pilot (part of istiod) is responsible to propagate routing rules and configs to the envoys. I am wondering how the communication is working?

  1. Is it a single gRPC stream that is opened when the sidecar container starts for the first time and that stays opened during the sidecars whole lifecycle. If the mesh changes, Pilot uses this stream to inform envoy via the xDS api about the changes? So updates are based on a push strategy? OR does the sidecar pull for new configs in a defined interval?
  2. What is the role of the istio agent (fromer pilot and citadel agent) in the sidecar container (especially the former pilot agent, I know that the Citadel agent is of the CSR process)? Does it pull for new configs, does it only bootstrap the envoy, but why is it then always running, ...?

Thanks in advance!

nikos
  • 115
  • 7

1 Answers1

2
  1. The best explanation how istio envoy works is from envoy documentation. It is actually lot more complicated than it seems:

Initialization

How Envoy initializes itself when it starts up is complex. This section explains at a high level how the process works. All of the following happens before any listeners start listening and accepting new connections.

  • During startup, the cluster manager goes through a multi-phase initialization where it first initializes static/DNS clusters, then predefined EDS clusters. Then it initializes CDS if applicable, waits for one response (or failure) for a bounded period of time, and does the same primary/secondary initialization of CDS provided clusters.

  • If clusters use active health checking, Envoy also does a single active health check round.

  • Once cluster manager initialization is done, RDS and LDS initialize (if applicable). The server waits for a bounded period of time for at least one response (or failure) for LDS/RDS requests. After which, it starts accepting connections.

  • If LDS itself returns a listener that needs an RDS response, Envoy further waits for a bounded period of time until an RDS response (or failure) is received. Note that this process takes place on every future listener addition via LDS and is known as listener warming.

  • After all of the previous steps have taken place, the listeners start accepting new connections. This flow ensures that during hot restart the new process is fully capable of accepting and processing new connections before the draining of the old process begins.

A key design principle of initialization is that an Envoy is always guaranteed to initialize within initial_fetch_timeout, with a best effort made to obtain the complete set of xDS configuration within that subject to the management server availability.

As for updating envoy config:

Runtime configuration

Envoy supports “runtime” configuration (also known as “feature flags” and “decider”). Configuration settings can be altered that will affect operation without needing to restart Envoy or change the primary configuration. The currently supported implementation uses a tree of file system files. Envoy watches for a symbolic link swap in a configured directory and reloads the tree when that happens. This type of system is very commonly deployed in large distributed systems. Other implementations would not be difficult to implement. Supported runtime configuration settings are documented in the relevant sections of the operations guide. Envoy will operate correctly with default runtime values and a “null” provider so it is not required that such a system exists to run Envoy.

Runtime configuration.

More information about how envoy proxy work can be found here.


  1. According to istio documentation:

The benefit of consolidation: introducing istiod

Having established that many of the common benefits of microservices didn’t apply to the Istio control plane, we decided to unify them into a single binary: istiod (the ’d’ is for daemon).

Let’s look at the benefits of the new packaging:

  • Installation becomes easier. Fewer Kubernetes deployments and associated configurations are required, so the set of configuration options and flags for Istio is reduced significantly. In the simplest case, you can start the Istio control plane, with all features enabled, by starting a single Pod.

  • Configuration becomes easier. Many of the configuration options that Istio has today are ways to orchestrate the control plane components, and so are no longer needed. You also no longer need to change cluster-wide PodSecurityPolicy to deploy Istio.

  • Using VMs becomes easier. To add a workload to a mesh, you now just need to install one agent and the generated certificates. That agent connects back to only a single service.

  • Maintenance becomes easier. Installing, upgrading, and removing Istio no longer require a complicated dance of version dependencies and startup orders. For example: To upgrade, you only need to start a new istiod version alongside your existing control plane, canary it, and then move all traffic over to it.

  • Scalability becomes easier. There is now only one component to scale.

  • Debugging becomes easier. Fewer components means less cross-component environmental debugging.

  • Startup time goes down. Components no longer need to wait for each other to start in a defined order.

  • Resource usage goes down and responsiveness goes up. Communication between components becomes guaranteed, and not subject to gRPC size limits. Caches can be shared safely, which decreases the resource footprint as a result.

istiod unifies functionality that Pilot, Galley, Citadel and the sidecar injector previously performed, into a single binary.

A separate component, the istio-agent, helps each sidecar connect to the mesh by securely passing configuration and secrets to the Envoy proxies. While the agent, strictly speaking, is still part of the control plane, it runs on a per-pod basis. We’ve further simplified by rolling per-node functionality that used to run as a DaemonSet, into that per-pod agent.

Hope it helps.

Piotr Malec
  • 3,429
  • 11
  • 16
  • Thanks for your answer! So do I understand correctly, that first time envoy connect with pilot during start up and requests the different xDS APIs? But when Envoy is running and Pilot notices a change in the mesh, Pilot uses the active gRPC connection to update the envoy config? – nikos Jun 07 '20 at 08:51