Highest Voted 'sre' Questions

0

votes

1 answer

PromQL queries to for SLI(Service Level Indicator) indicators using prometheus/grafana and blackbox exporter

i want to achieve the specified SLI(Service Level Indicator) for our http endpoints using blackbox exporter for probing like the following indicators: 80% availability Latency less than 1s For latency i figured i can use the query…

asked Jun 06 '23 at 18:20

sal

33
6

0

votes

1 answer

Harbor registry proxy cache vs replication

I'm new to Harbor registry. I was asked to propose an architecture for harbor in my company. I proposed at first to use an architecture based on proxy cache. But the CISO refused to use proxy cache for the entreprise without saying why. I proposed…

devops system-administration harbor sre trivy

asked Jun 01 '23 at 04:32

mastertopg

3
1

0

votes

0 answers

Does anyone have dataset that can be used for root cause analysis?

I need a lot of data to build a knowledge graph Our team is trying to build a knowledge map but there is enough knowledge data

influxdb chaos sre

asked May 09 '23 at 08:23

zc lu

1

0

votes

1 answer

Application Monitoring using sql and shell script

we are using shell scripts and sql queries to monitor our application. we are planning to migrate to cloud and use prometheus and openserch for monitoring. Is there a way to execute oracle sql quires(get the number of active users etc) and store…

prometheus monitoring opensearch sre

asked Feb 28 '23 at 15:25

user3069309

29
4

0

votes

0 answers

How to create Grafana alert for when backup failed?

We have 3 PostgreSQL databases in GCP's CloudSQL, all three of them are backed-up daily. I need to use Grafana to monitor those back-ups and alert when they've failed. Unfortunately I'm not finding many resources to help me with this task. Is there…

google-cloud-platform grafana monitoring grafana-alerts sre

asked Jan 09 '23 at 17:30

Filipe Cela

43
5

0

votes

0 answers

Can Sonarqube check if code is Observable or not?

As there is demand of a reliable software systems i.e. site should be reliable (SRE), there is need to check if the code is observable or not ? Sonarqube has any rules to check for the same or not ?

sonarqube sre observability

asked Nov 25 '22 at 05:41

UmeshPathak

145
1
2
13

0

votes

0 answers

RPN (Risk Priority Number) of FMEA Analysis and SLO

One of the concepts in FMEA ( Failure Mode & Effect Analysis ) is the RPN (Risk Profile Number) which decides how to prioritize your actions for addressing those failures. However, going by just severity, probability and effectiveness of control…

sre

asked Nov 02 '22 at 13:43

kembhootha

83
5

0

votes

1 answer

Should an not found or empty response be always 404?

I have an endpoint for a REST API that checks for the existence of a (or a list of) requests. It can return 200 OK if there is an order in progress or 404 NOT FOUND if there are no current orders Creating an availability SLO for this API, I noticed…

api rest http sre

asked Sep 09 '22 at 04:06

Plinio Fabrycio

66
2
7

0

votes

0 answers

Consul Serf Health Status

I have installed on my localhost, a consul server (leader) with an IP address of 192.168.48.1 => running ok Then I installed a vagrant box (ubuntu 20.04) as a consul agent, with an ip address of 10.0.2.15 and I informed about the bridge within the…

devops consul sre consul-health-check

asked Jul 19 '22 at 10:04

YoussHark

558
1
9
26

0

votes

1 answer

Promethesus: How do I write a PromQL query to find the drastic increase or decrease by some X% in my graph and stays for 10m, need to raise an alert

I am trying to use rate() query like comparing last 10 min with the previous 50 min like: (sum by() rate(cmd_get{}[10m]) / (sum by() rate(cmd_get{}[50m] offset 10m)) If I want to check the percentage increase is more than 50% then what is the…

prometheus promql prometheus-alertmanager grafana-alerts sre

asked Jul 14 '22 at 19:40

samantha

1

0

votes

1 answer

Alertmanager: how to send alerts only in weekdays?

I tried to add this to my alertmanager.yml in root level, but I got this error: yaml: unmarshall errors: field time_intervals not found in type config.plain time_intervals: - times: weekdays: ['monday:friday'] (I used 0.23 version of…

prometheus sre prometheus-alertmanager observability

asked Jun 02 '22 at 14:29

TestAutomator

289
1
3
14

0

votes

1 answer

How to set SLO for operations that are dependent on file size?

I have an endpoint POST /upload that uploads file into my storage. The response time is dependent on the file size (the bigger file, the longer it takes to respond with 200). How should I set a Service Level Objective (SLO) with this endpoint? Any…

sre service-level-agreement

asked May 25 '22 at 07:25

NyamNyam

320
1
3
13

0

votes

1 answer

How can I OOM kill a pod manually in Kubernetes

I'm trying to manually OOM Kill pods for testing purposes, does anyone know how I can achieve this?

linux kubernetes kubernetes-pod infrastructure sre

asked May 23 '22 at 14:23

GreatBear

29
4

0

votes

1 answer

Puppet3 | read values from different yaml file

So I'm using puppet3 and I have X.yaml and Y.yaml. X.yaml has profiles::resolv_conf::nameservers: [ '1.1.1.1', '8.8.8.8', '2.2.2.2' ]in it. I want to add that [ '1.1.1.1', '8.8.8.8', '2.2.2.2' ] as a value to the servers: which is in Y.yaml: …

devops puppet sre

asked Apr 04 '22 at 16:32

Codemypath

3
3

0

votes

1 answer

Flink 1.14.3 - [issue] failed to bind to /0.0.0.0:6123

We are using 1.14.3 version of flink and when we try to run Job manager, we are getting below exception. I tried entering akka.remote.netty.tcp.hostname = "127.0.0.1" in flink-conf.yml file and even updated IP with hostname. But didnt…

devops apache-flink flink-streaming flink-batch sre

asked Feb 02 '22 at 10:49

Vinayraj007

1

Questions tagged [sre]