1

We are going to have a high load on telemetry service. I'm looking for solutions, which be able to scale collector and backend(zipkin)

There is solution for scaling zipkin. Seems simple - just use internal balancing

  loadbalancing:
protocol:
  otlp:
    timeout: 1s
    insecure: true
resolver:
  static:
    hostnames:
    - localhost:55690
    - localhost:55700
    - localhost:55710
    - localhost:55720

But, I can't find examples for using multiple openTelemetry collectors. There is no troubles to run several instances of collector, but how can I say "myApp" to balance beetwin them? There is no such option in exptorters.

What is the right way to scale such system? enter image description here

  • Alex, did you find a solution to this? also looking for the same – Quade May 18 '21 at 05:32
  • I haven't found a good solution yet. My temporary solution is otel-agent-contrib->otel>collector->jaeger-collector Otel-agent-contrib is able to balance request to otel-collector. I believe jaeger will support otlp, so I will rid of otel-collector https://github.com/jaegertracing/jaeger/issues/2934#issuecomment-818654475 – Alexander Kazantsev May 19 '21 at 10:18

2 Answers2

1

According to the otel-collector/performance README, you can start multiple collectors behind a load balancer or use k8s deployment.

Natashz
  • 71
  • 1
  • 3
0

In general, the right way to balance requests to a cluster of OpenTelemetry Collectors is to use an off-the-shelf load balancer.

If, for some reason, you need to balance based on payload attributes, such as trace ID, you can use the load balancing exporter from OpenTelemetry Collector, the one you linked above. In that scenario, you'd have a layer of load balancers that would export data to a second layer of OTel Collectors, each configured to export to one backing Zipkin server.

You might also want to keep an eye on the state of Connectors, a new component type of the Collector, which would eventually allow you to skip the second layer, so that the load balancer can send data straight to your Zipkin instances.

I can recommend two resources related to this:

  1. A presentation I delivered at KubeCon on the deployment patterns, including the scaling scenario I mentioned here: https://www.youtube.com/watch?v=WhRrwSHDBFs

  2. The new scaling documentation page that was published recently: https://opentelemetry.io/docs/collector/scaling/

jpkroehling
  • 13,881
  • 1
  • 37
  • 39
  • I have 2 concerns with the load balancing exporter , especially when gateway deployment pattern is used and tail sampling is to be performed , 1. what happens when new collector instances are added/removed in second tier for ongoing transactions (some spans are yet to be recieved) ,do the collectors in first tier maintain a trace-ID to collector instance mapping internally to handle this case (if yes then how is this mapping shared among all collectors in the first tier). 2. if a span is lost on the network then what happens to the trace which is held in memory of a collector. – shadow0wolf Aug 14 '23 at 00:10