Questions tagged [distributed-tracing]

Distributed Tracing aims to provide better observability into distributed systems and microservices for purposes of performance monitoring and troubleshooting issues.

Distributed Tracing

Distributed Tracing aims to provide better observability into distributed systems and microservices for purposes of performance monitoring and troubleshooting issues.

Modern Internet services are often implemented as complex, large-scale distributed systems. These applications are constructed from collections of software modules that may be developed by different teams, perhaps in different programming languages, and could span many thousands of machines across multiple physical facilities. Tools that aid in understanding system behavior and reasoning about performance issues are invaluable in such an environment.

Source: Dapper, a Large-Scale Distributed Systems Tracing Infrastructure

How it works in a nutshell

Distributed Tracing works by collecting the various entry and exit points and useful intermediate data and metrics done by a request until the final response is served to the requesting end. Some Distributed Tracing systems collect this information fully automatic while some other require manual instrumentation of code.

When entering a system, the request is usually assigned a unique Trace ID. This ID is then propagated to any participating systems. Information gathered this way is sent to some sort of backend collecting the data. The collector then aggregates the data via the Trace ID, thus showing the full request as it passed through the distributed system.

Metrics usually included are request time, latency, errors, status codes, etc. but not limited to this.

Open Source implementations:

Several Open Source implementations for Distributed Tracing exist:

  • http://opencensus.io

    A single distribution of libraries for metrics and distributed tracing with minimal overhead that allows you to export data to multiple backends.

  • http://opentracing.io

    Vendor-neutral APIs and instrumentation for distributed tracing

  • http://zipkin.io

    Zipkin is a distributed tracing system. It helps gather timing data needed to troubleshoot latency problems in microservice architectures. It manages both the collection and lookup of this data.

  • http://www.jaegertracing.io

    Jaeger, inspired by Dapper and OpenZipkin, is a distributed tracing system released as open source by Uber Technologies. It is used for monitoring and troubleshooting microservices-based distributed systems

There is also a W3 working group aiming to standardize context propagation across various Distributed Tracing systems:

Because Distributed Tracing is crucial for application performance monitoring, most APM vendors adopted it in one way or another. Notable APM vendors offering Distributed Tracing are AppDynamics, DynaTrace, Instana, Lightstep or New Relic.

219 questions
1
vote
1 answer

Passing trace id from spring web app(using cloud sleuth) to non spring web application when hitting an api

I have two services A and B A : it is written in spring boot B: it is web app written in java with custom framework I want to pass trace id generated by Spring cloud Sleuth when calling api from A to B and then want to use it in logging in B . IF B…
1
vote
1 answer

Can we add parameters to the default traces generated by spring-cloud-sleuth?

I have integrated spring-cloud-sleuth with my Spring-boot application. I have also used open-telemetry as the log exporter. When I call a REST API in my application, I can see a span being generated. However, this span does not have information…
1
vote
0 answers

Standard for exposing a distributed tracing trace ID in an HTTP response header

There is good documentation (OpenTelemetry project, W3C) on generating, passing, and parsing distributed tracing headers (e.g., traceparent) on the server side, however I have not found anything authoritative on exposing trace IDs to the client…
1
vote
0 answers

When using a proxy to log function calls within a .NET object, what's the best way to capture calls from within the proxied object?

During a recent assignment, I was tasked to test out OpenTelemetry as a way to enhance visibility in our (ASP.NET Core) app. One of the goals I was hoping to achieve involved providing function-level traces throughout each of our services, but a…
1
vote
0 answers

Distributed Tracing with Service Bus Triggered Function

According to the docs around distributed tracing for Servicebus, the Diagnostic-Id property should be used to store the WC3 traceparent. When an Azure Function is triggered via a service bus message, I would expect this value to be used to…
NSjonas
  • 10,693
  • 9
  • 66
  • 92
1
vote
0 answers

Set custom traceId in spring sleuth

I have an angular application using a tracing library to trace each operation (user bouton click). This application after SPA is loaded sends a list of traces in the request body to the backend microservice to log them. In the backend microservice,…
1
vote
0 answers

Kubernetes services not appearing in Jaeger UI

Is there a way to get Kubernetes services to register in Jaeger? I have Jaeger v1.37.0 installed using helm. But services don't seem to get registered with it. I've gone through some documentation which suggest that the Zipkin port needs to be…
Metro
  • 873
  • 8
  • 19
1
vote
2 answers

Opentelemetry collector vs Instana collector

We have a bunch of microservices where currently using instana collector for tracing. I am aware it is a broad question but even if I am personally supporter for opentelemetry I could not find any comparison between otel collector and rest of…
semural
  • 3,583
  • 8
  • 37
  • 67
1
vote
1 answer

sleuth does not show Trace Id / Span Id in logs while WebClient Rest call

On rest api call with Webclient, few default logs are printed like below but sleuth doesn't add tracid with it. see below: 2022-08-10 10:18:26.123 DEBUG [cib_bulk,,] 1 --- [or-http-epoll-1] r.netty.http.client.HttpClientConnect : [7c54bef8-1,…
1
vote
0 answers

sentry multiple projects for one service: combine into one sentry project or use multiple sentry projects?

my company is running one service comprised of multiple components: react (nextjs) react-native flask server fast-api server Would it be a good idea to combine them into one 'sentry project'? Currently, distributed-tracing (sentry-performance)…
1
vote
1 answer

Can the value of log4j ThreadContext map get overlapped when multiple requests are made simultaneously?

I am using log4j ThreadContext for tracing in a spring boot application. I have created an interceptor by implementing HandlerInterceptor which intercepts a request and then sets 'x' value in the ThreadContext map using…
1
vote
0 answers

Jaeger: ignore certain spans in a tracing

I'm just getting started with implementing Jaeger in our pipeline, using python and the opentelemetry packages. A single task has let's say 5 steps. Step 1 can take 1-2 days. Then step 2 can take 3-4 hours. And then steps 3-5 typically take 5-30…
Hashcut
  • 833
  • 1
  • 5
  • 19
1
vote
0 answers

Azure API Management and Application Insights, traces not correlated for requests done with send-request policy in inbound scope

Problem: I'm building a composition API resource where I in the inbound scope have defined some send-request policy's. For observability we use Application Insights, and I've activated this in APIM. The logging in APIM is configured on a global…
1
vote
1 answer

Redis Cache Calls In OpenTelemetry in DotNet

Hii In My Project we are using services.AddStackExchangeRedisCache(options => { options.Configuration = ""; }); For Redis cache calls. Now to trace all Calls while Requesting Api Endpoint I'm Using…
Vivek
  • 11
  • 3
1
vote
1 answer

Tracing AWS Step Functions with Splunk (via X-Ray)

AWS supports tracing Step Functions with X-Ray with a one click instrument step. Once activated, the step function context is propagated through all the lambda functions. Can Splunk Observability Suite (APM) use the x-ray context data? Or is there…
michael
  • 2,577
  • 5
  • 39
  • 62