Questions tagged [reliability]

Reliability is the ability of a system or component to perform its required functions under stated conditions for a specified period of time.

Reliability is the ability of a system or component to perform its required functions under stated conditions for a specified period of time.

305 questions
1
vote
0 answers

Cohen's kappa functions from two packages (irr, DescTools) give different results

I am trying to compute a weighted kappa with confidence intervals in R. I am having trouble understanding why the two functions, DescTools::CohenKappa and irr::kappa2, give different outputs. I need to use the former (DescTools::CohenKappa) as I…
Dani
  • 161
  • 9
1
vote
1 answer

Why Weibull Reliability analysis Beta Formula in Python not matching with Normal Slope of Regression line?

I am new to reliability Analysis, found Weibull package in python is useful for my analysis. trying my level best to understand the formulas used in Weibull Package. Business Problem: There are Few engine parts with Failure_Time and Failure_Type, I…
1
vote
1 answer

Problem with grouped psych::alpha within the do::dplyr/tidyverse and broom::tidy

I have survey data performed using the same questionnaire in different languages. I would like to write an elegant dplyr/tidyverse code for the reliability for each language, using psych::alpha within. Let's imagine, that the data frame (df) looks…
1
vote
2 answers

How can I ensure a series data modify operation all performed?

I have a series data modify operation to do,such as 1. update table_a set value=1 where id=1 2. update table_b set value=2 where id=1 3. update table_c set value=3 where id=1 and I want to ensure this three operation must all complete,I know…
ChenZhou
  • 102
  • 9
1
vote
1 answer

How to recover a critical python job from system failure

Is there any python library that would provide a (generic) job state journaling and recovery functionality? Here's my use case: data received to start a job job starts processing job finishes processing I then want to be able to restart a job back…
Garrett Motzner
  • 3,021
  • 1
  • 13
  • 30
1
vote
2 answers

Which distribution do I use to simulate "random" program crashes?

I want to test a distributed program resistance to random crashes of each node. I need to use some random distribution that would control how often each node would crash - for example, it might be normal distribution with an average of 2 hours or…
sharptooth
  • 167,383
  • 100
  • 513
  • 979
1
vote
3 answers

How do I intentionally make Azure role crash?

I want to make a Windows Azure application as fault-resistant as possible and I need to be able to make roles crash intentionally to test how the whole application recovers from such crashes. I guess I could insert code right into role that would…
sharptooth
  • 167,383
  • 100
  • 513
  • 979
1
vote
3 answers

How do I make my Windows Azure application resistant to Azure datacenter catastrophic event?

AFAIK Amazon AWS offers so-called "regions" and "availability zones" to mitigate risks of partial or complete datacenter outage. Looks like if I have copies of my application in two "regions" and one "region" goes down my application still can…
sharptooth
  • 167,383
  • 100
  • 513
  • 979
1
vote
1 answer

How to make make nginx run a script whenever there's a 502 error

So here's what I'm trying to do. Whenever nginx returns a 502, we want it to send a message to our dashboard with the url path that returned a 502. Then in our dashboard we will display in descending order the urls that returned 502 and their…
edmamerto
  • 7,605
  • 11
  • 42
  • 66
1
vote
2 answers

Prevent cascading failures in Kafka consumers

Imagine you have a Kafka consumer group with 3 members (M1, M2, and M3). Each member is running in it's own process, and each currently has one partition assigned (Pa, Pb, and Pc). M1 receives a poison message from P1 which is crafted such that it…
RB.
  • 36,301
  • 12
  • 91
  • 131
1
vote
1 answer

Does sql transactions reliable over internet

i have cloud based solutions (azure function, which reads json from service bus and convert to c# object) which inserts data into on prem sql server with multiple insert statements, are sql transactions reliable and secure over the internet. what if…
user1222614
  • 91
  • 1
  • 1
  • 3
1
vote
2 answers

What approach can I use to notify myself of jobs that've not run as per their schedule due to any reason? (OOM, etc)

So I have a quite a few workers that execute frequently ranging from daily to hourly, etc. There have been incidents where a few of them just did not execute without any signature or failure. I need to come up with a solution to track these. I…
1
vote
1 answer

RabbitMQ dead letter handling guarantees

If I use publisher confirms, I can be (reasonably) sure that a message sent to an exchange on the RabbitMQ server, and which received ACK from the RabbitMQ server is not lost even if the RabbitMQ server crashes (power outage for example). However…
John Donn
  • 1,718
  • 2
  • 19
  • 45
1
vote
0 answers

Reliable notifications on iOS without using push notifications

I'm looking for a way to reliably implement notifications for an iOS app without using the Apple push notification service. Due to some constraints on the app - it's not a centralized app, the company sells server software to customers who run on…
DeducibleSteak
  • 1,398
  • 11
  • 23
1
vote
1 answer

Is creating different executable components for a software good practice?

Suppose i am creating a software in Language A and some of the SDK i need is available in language B. So i create two executable and connect them using socket. what are the issues that can come in terms of reliability and security ? Is this…
Mohit
  • 27
  • 1
  • 10