Questions tagged [observability]

monitoring the internal state of a system by looking at its output

Observability is the ability to answer any question about a business or application through the collection and analysis of data. Succinctly, it’s an approach to understanding the operation of a system by reviewing output from the system. In the software world, observability generally is framed in the context of the ‘three pillars’ or telemetry data types: metrics, traces, and logs. Combining these three types of data gives you the power to answer questions about your business/application that you may not have known you’d need answers to when you set it up..

83 questions
0
votes
1 answer

Is there a way of finding the Observable/Unobservable decomposition using python (maybe from the control libarary)?

I am currently working on a project where I need to decompose my system into observable and unobservable subsystems in an efficient way, so I was looking for a function that could help me with that. PS: I know about this function and it is not was I…
Maya
  • 1
  • 2
0
votes
1 answer

Handling logs of huge volume with fluent-bit/fluentd

We have the following observability stack. We are often challenged with huge influx of logs from certain apps running on ECS which causes the log aggregator to restart and eventually making ES unstable. We incorporated a few ways to alleviate…
fledgling
  • 991
  • 4
  • 25
  • 48
0
votes
2 answers

OpenTelemetry JVM and System metrics

I have used micrometer.io for most of my career to collect metrics. One of the coolest micrometer features is binding to collect information about the host system and jvm: https://micrometer.io/docs/ref/jvm on the basis of which it was possible to…
dsinczak
  • 217
  • 1
  • 10
0
votes
1 answer

How can I group two Prometheus timeseries on a new label using promql?

Let's say I have two prometheus timeseries, ts1 and ts2. I would like to combine them to create a new timeseries, tsK which will have a label inside for the consitutent timeseries, i.e. tsK{inner_ts="ts1"} should yield the original ts1 timeseries…
information_interchange
  • 2,538
  • 6
  • 31
  • 49
0
votes
1 answer

NewRelic helm chart Installation

We are trying to setup the open-sourced newrelic-infrastructure app locally in our machine in Kubernetes.Its giving the error message saying "It requires license key " GitHub URL…
0
votes
1 answer

Distributed tracing library - Custom trace id

As part of our spring application, we are using Spring Sleuth to inject traceid & spanid into the requests. This neatly works with SL4J via MDC integration to propagate to the logs as well. But running into issues with our organization not using B3…
0
votes
1 answer

Need help building an uptime dashboard for a distributed system

I have a product for which I would like to create a dashboard to show its availability/uptime over time and display any outages. Specifically I am looking for ability to report historical information on service uptime provide details on any service…
1 2 3 4 5
6