Questions tagged [distributed-tracing]

Distributed Tracing aims to provide better observability into distributed systems and microservices for purposes of performance monitoring and troubleshooting issues.

Distributed Tracing

Distributed Tracing aims to provide better observability into distributed systems and microservices for purposes of performance monitoring and troubleshooting issues.

Modern Internet services are often implemented as complex, large-scale distributed systems. These applications are constructed from collections of software modules that may be developed by different teams, perhaps in different programming languages, and could span many thousands of machines across multiple physical facilities. Tools that aid in understanding system behavior and reasoning about performance issues are invaluable in such an environment.

Source: Dapper, a Large-Scale Distributed Systems Tracing Infrastructure

How it works in a nutshell

Distributed Tracing works by collecting the various entry and exit points and useful intermediate data and metrics done by a request until the final response is served to the requesting end. Some Distributed Tracing systems collect this information fully automatic while some other require manual instrumentation of code.

When entering a system, the request is usually assigned a unique Trace ID. This ID is then propagated to any participating systems. Information gathered this way is sent to some sort of backend collecting the data. The collector then aggregates the data via the Trace ID, thus showing the full request as it passed through the distributed system.

Metrics usually included are request time, latency, errors, status codes, etc. but not limited to this.

Open Source implementations:

Several Open Source implementations for Distributed Tracing exist:

  • http://opencensus.io

    A single distribution of libraries for metrics and distributed tracing with minimal overhead that allows you to export data to multiple backends.

  • http://opentracing.io

    Vendor-neutral APIs and instrumentation for distributed tracing

  • http://zipkin.io

    Zipkin is a distributed tracing system. It helps gather timing data needed to troubleshoot latency problems in microservice architectures. It manages both the collection and lookup of this data.

  • http://www.jaegertracing.io

    Jaeger, inspired by Dapper and OpenZipkin, is a distributed tracing system released as open source by Uber Technologies. It is used for monitoring and troubleshooting microservices-based distributed systems

There is also a W3 working group aiming to standardize context propagation across various Distributed Tracing systems:

Because Distributed Tracing is crucial for application performance monitoring, most APM vendors adopted it in one way or another. Notable APM vendors offering Distributed Tracing are AppDynamics, DynaTrace, Instana, Lightstep or New Relic.

219 questions
0
votes
1 answer

Tracing using Correlation-ID/etc when one service A makes multiple calls to service B

If a user calls service A, and service A then calls service B, tracing is simple using the correlation ID. Now if service A calls service B multiple times, the same correlationID gets used for each of those calls, which makes tracing slightly…
Jerald Baker
  • 1,121
  • 1
  • 12
  • 48
0
votes
1 answer

Can I trace every request using AWS X-Ray?

According to the docs, the X-Ray SDK applies a sampling algorithm to determine which requests get traced. By default, the X-Ray SDK records the first request each second, and five percent of any additional requests. Is it possible to trace all…
rusheb
  • 1,004
  • 9
  • 15
0
votes
0 answers

Tempo confusing span order

I have a service foo calling a service bar running in k8s. Both services are istio enabled with tracing enabled. We use Tempo for distributed tracing. Most of the traces consist of a span foo that includes a span of bar (as expected), however there…
0
votes
0 answers

Issue with exporting OpenTelemetry traces

I'm facing an issues when exporting traces to Grafana Tempo via the OpenTelemetry Collector. This is the error I get. Any help would be appreciated. Transient error StatusCode.UNAVAILABLE encountered while exporting traces, retrying in…
0
votes
0 answers

Is it possible to use tracing with multiprocessing?

I have a microservice developed using FastAPI. We have implemented Zipkin tracing across all of our microservices. My microservice uses multiprocessing to process an iterable in parallel. If I create a span, then call another function which creates…
KOB
  • 4,084
  • 9
  • 44
  • 88
0
votes
0 answers

parentSpanId not propagated in spring boot 2.6.x and spring cloud 2021.0.4

I have upgraded my project to spring boot 2.6.11 and spring cloud 2021.0.4. After that, the parentId disappeared from the log. I am using logback to logging purpose. So there is a way to display it? I added a logging pattern but it didn't change…
0
votes
1 answer

Spring Cloud Sleuth Parent-Span-Id is null and not propogating

I am using spring-cloud-sleuth I can see the traceId and spanId in the logs, but parentTraceId is always null, not seeing it in the MDC. What can be the reason?
emanuel07
  • 738
  • 12
  • 27
0
votes
1 answer

changing Activity.Current for the caller in an async method

I have a library I share among several micro services. The library contains helper methods to read messages from Kafka. As part of the message, I have a trace id that I want to use to start new Activity: (simplified pseudo code) // code that is…
oocx
  • 780
  • 5
  • 8
0
votes
1 answer

Datadog tracing: Can I add a span to multiple traces?

Is there a way to add a span to multiple traces in Datadog tracing? Our service receives orders that are batched into a transaction and then the transaction is processed. Each received order comes with its own trace-id. When processing the…
Julien__
  • 1,962
  • 1
  • 15
  • 25
0
votes
1 answer

Opentelemetry running a separate collector vs Zipkin

I have a working example of OpenTelemetry with java auto instrumentation and I am using the Zipkin for viewing the traces. My question is basically simple but I am not getting a clear answer yet. I want to configure samplers and filters for…
Metalhead
  • 1,429
  • 3
  • 15
  • 34
0
votes
1 answer

Jaeger distritubed tracing with application integrated with 3rd party APIs

Currently analysing a probable distributed tracing tool for our Event Driven Microservice architecture. Which currently looks somewhat similar to the picture below. As I understand all the application integrated with jaeger(spring boot application…
Anirudh
  • 2,286
  • 4
  • 38
  • 64
0
votes
0 answers

Websockets distributed tracing solution

How to add distributed tracing to websockets properly? As each websocket connection is long lived, a lot of messages ends up getting traced under one activity. Is there any way to split that? (See below link for reference) Looks like there is an…
0
votes
1 answer

Configuring open telemetry for tracing service to service calls ONLY

I am experimenting with different instrumentation libraries but primarily spring-cloud-sleuth and open-telemetry ( OT) are the ones I liked the most. Spring-cloud-sleuth is simple but it will not work for a non-spring ( Jax-RS)project , so I…
Metalhead
  • 1,429
  • 3
  • 15
  • 34
0
votes
1 answer

How to use Rusts tracing_distributed

I am trying to use the Rust tracing_distributed package, but I am getting strange and unhelpful errors when using it, and I am assuming I am using it wrong, but there is no documentation and there are no examples about how to use it. Here is an…
0
votes
0 answers

Why Mono doOnSuccess is not logging sleuth correlationId?

doOnSuccess method of Mono class is not logging the sleuth correlationId in logs. Response recieved from http://localhost:8080/testController/test is the log statement where correlationId should have been logged. I have also checked the…