OpenTelemetry collector scaling

Question

We are going to have a high load on telemetry service. I'm looking for solutions, which be able to scale collector and backend(zipkin)

There is solution for scaling zipkin. Seems simple - just use internal balancing

  loadbalancing:
protocol:
  otlp:
    timeout: 1s
    insecure: true
resolver:
  static:
    hostnames:
    - localhost:55690
    - localhost:55700
    - localhost:55710
    - localhost:55720

But, I can't find examples for using multiple openTelemetry collectors. There is no troubles to run several instances of collector, but how can I say "myApp" to balance beetwin them? There is no such option in exptorters.

What is the right way to scale such system?

Alex, did you find a solution to this? also looking for the same — Quade, May 18 '21 at 05:32
I haven't found a good solution yet. My temporary solution is otel-agent-contrib->otel>collector->jaeger-collector Otel-agent-contrib is able to balance request to otel-collector. I believe jaeger will support otlp, so I will rid of otel-collector https://github.com/jaegertracing/jaeger/issues/2934#issuecomment-818654475 — Alexander Kazantsev, May 19 '21 at 10:18

score 1 · Answer 1 · answered Jul 01 '21 at 10:22

1

According to the otel-collector/performance README, you can start multiple collectors behind a load balancer or use k8s deployment.

answered Jul 01 '21 at 10:22

Natashz

71
1
3

1

What do you mean 'load balancer' ? Some external balancer like nginx? – Alexander Kazantsev Jul 01 '21 at 13:09
1

@Alex yes exactly. BTW, the otel slack community is very helpful if you wish to consult! You can join through the link here: https://opentelemetry.io/community/ – Natashz Jul 03 '21 at 15:55

score 0 · Answer 2 · answered Apr 06 '23 at 11:51

In general, the right way to balance requests to a cluster of OpenTelemetry Collectors is to use an off-the-shelf load balancer.

If, for some reason, you need to balance based on payload attributes, such as trace ID, you can use the load balancing exporter from OpenTelemetry Collector, the one you linked above. In that scenario, you'd have a layer of load balancers that would export data to a second layer of OTel Collectors, each configured to export to one backing Zipkin server.

You might also want to keep an eye on the state of Connectors, a new component type of the Collector, which would eventually allow you to skip the second layer, so that the load balancer can send data straight to your Zipkin instances.

I can recommend two resources related to this:

A presentation I delivered at KubeCon on the deployment patterns, including the scaling scenario I mentioned here: https://www.youtube.com/watch?v=WhRrwSHDBFs
The new scaling documentation page that was published recently: https://opentelemetry.io/docs/collector/scaling/

I have 2 concerns with the load balancing exporter , especially when gateway deployment pattern is used and tail sampling is to be performed , 1. what happens when new collector instances are added/removed in second tier for ongoing transactions (some spans are yet to be recieved) ,do the collectors in first tier maintain a trace-ID to collector instance mapping internally to handle this case (if yes then how is this mapping shared among all collectors in the first tier). 2. if a span is lost on the network then what happens to the trace which is held in memory of a collector. — shadow0wolf, Aug 14 '23 at 00:10

OpenTelemetry collector scaling

2 Answers2