Questions tagged [cdf]

CDF is an acronym for cumulative distribution function. While the pdf gives the probability density of each value of a random variable, the cdf (often denoted F(x)) gives the probability that the random variable will be less than or equal to a specified value.

A cumulative density function describes the probability that a real-valued random variable X with a given probability distribution will be found at a value less than or equal to x.

The cdf of a discrete random variable is the summation of the probability mass function (pmf) of that distribution. If the random variable is continuous, this turns out to be the integral of the probability density function (pdf).

enter image description here

In applied statistics, cdfs are important in comparing distributions, playing a role in plots (e.g., pp-plots), and hypothesis tests (e.g., the Kolmogorov-Smirnov test).

Strongly related to


Common Data Format

Please pay attention another acronym for CDF is describe in and here is the NASA link for more details.

341 questions
2
votes
2 answers

SciPy Cumulative Distribution Function Plotting

I am having troubles plotting a Cumulative Distribution Function. So far I Have found this: scipy.stats.beta.cdf(0.2,6,7) But that only gives me a point. This will be what I use to plot: pylab.plot() pylab.show() What I want it to look like is…
Overtim3
  • 85
  • 1
  • 2
  • 13
1
vote
1 answer

Label ECDF plot points

I'm trying to label the points of an ECDF plot with another column from my data field. Currently I'm using this: untouched = read.table("results-untouched.tsv", sep="\t") plot.ecdf(untouched$V4, xlim=c(0.75,1.25), ylim=c(0,1), col='green',…
cdecker
  • 4,515
  • 8
  • 46
  • 75
1
vote
2 answers

Which API or framework to use for building dashboards using Pentaho?

I want to build an dashboard for which back end will be Pentaho community edition. I am using Mondrian and MDX queries. I have tried to use Pentaho CDF for building dashboard but it seriously lacks documentation and without proper documentation its…
Shekhar
  • 11,438
  • 36
  • 130
  • 186
1
vote
1 answer

How to know number of rows affected by CDF merge in pyspark?

I have a CDF logic where I need to know of row impacted due to merge i.e I need to know number of inserted, Updated and deleted row in order to make some decision. i am able to get to the required information in SQL, but did not get in…
1
vote
1 answer

Multiple cumulative cdf plots

I have multiyear data (14 years) with 15 mins internals and I am trying to make multiple plots comparing short term and long term CDF values. I am computing the cumulative distribution function (CDF) of one variable (Energy values) for each…
Jawairia
  • 295
  • 4
  • 14
1
vote
2 answers

Finding Percentiles and Values From Calculated Gamma Distribution

Background I am working on computing a series of best-fit gamma curves for a 2-D dataset in Numpy (ndarray), a prior question for the genesis of this can be found here. Scipy was previously utilized (scipy.gamma.stats), however this library is not…
TornadoEric
  • 399
  • 3
  • 16
1
vote
1 answer

How can I get the intersection point of two crossing ecdfs in R?

I have two ecdf plots using below code: ecdf1 <- ecdf(data1) ecdf2 <- ecdf(data2) These plots are crossing each other. I need to take the crossing point (intersection point) coordinates. How should I do this in R?
Nmgh
  • 113
  • 7
1
vote
0 answers

How to find the cumulative distribution function of storm durations from a precipitation time series?

I have two precipitation time series and I want to plot the empirical cumulative distribution function(CDF) of storm duration for both cases on the same plot. Can I get the storm durations in a list form somehow? And then how to plot the CDF of…
mir farhan
  • 11
  • 2
1
vote
1 answer

R ggplot two cumulative distribution functions in the same plot are not correctly displayed

I wanted to graphically illustrate an example of how the continuously ranked probability score is calculated. For that, I need two (normal) cumulative probability distributions in one plot, the predicted cdf(blue) and the observed cdf(red). The…
Jonas S
  • 11
  • 2
1
vote
0 answers

getting values from a CDF

Good morning, everyone. I have a set of values. Arr = np.array([0.11, 0.14, 0.22, 0.26, 0.31, 0.36, 0.44, 0.69, 0.70, 0.70, 0.70, 0.75, 0.98, 1.40]) I have constructed the CDF function in this way: def ecdf(a): x, counts = np.unique(a,…
1
vote
1 answer

Pick values from a CDF curve

everyone, I have a generic values distribution. I post the graph. Is there a way to generate a CDF from these values? Using sns I can create a graph: My goal is to assign a value to the y-axis and take a value from the x-axis from the CDF. I'm…
1
vote
1 answer

why the cumsum of pdf and cdf are different for scipy.stats.norm?

I tried to compare the results for the values of cdf from cumsum of pdf and cdf of scipy.stats.norm. Why these are different? #%% import numpy as np from scipy.stats import norm x=np.arange(10) m=np.mean(x) # mean of x v=np.var(x,ddof=1) #…
agile
  • 77
  • 5
1
vote
0 answers

Legend title gets cut off

I make a plot of Cumulative distribution functions of seven distributions. However, it does not show all the elements of the legend. The plot looks like this: The distribution does not show "Distribution" in full. Can anyone tell me how to solve…
Simon
  • 15
  • 4
1
vote
0 answers

Computing cumulative proportions from samples

I want to compute cumulative proportions of samples. R> x=c(92, 3, 1, 4, 15, 4) R> ecdf(x)(x) [1] 1.0000000 0.3333333 0.1666667 0.6666667 0.8333333 0.6666667 The above is a way to compute cumulative proportions as shown in the first figure. I think…
user1424739
  • 11,937
  • 17
  • 63
  • 152
1
vote
1 answer

Can mean() function show probability of cumulative distribution function?

I was doing my assignment, and I found something strange. I did this code for question #1. x <- heights$height[heights$sex=="Male"] and the next question is like this: "We will define a function "CDF" like following: CDF <- function(a)…