3

I have 2 CDF and have to find the maximum pointwise distance. I created histograms and plotted both. The values are generated by a random function which takes the sum of two random numbers 1-6 for 100 times, similar to two dice. But, I can't manage to find the maximum distance between two lines on the plot.

So, on the first run I have a list of 100 observations, dicesum=: {1: 5, 2: 8, 3: 7, ...., 100:4}.

1 to 100 is the number of spins and the right side is the sum. With this code I generated the histogram:

keys,values = zip(*dicesum.items())
plt.hist(values, bins=30)
plt.gca().set(title='Frequency Histogram', ylabel='Frequency');
plt.show()

The histogram: histogram

Now I plot the CDF with this code:

x = np.sort(values)
y = np.arange(1, len(x)+1/float(len(x)))
plt.plot(x, y, color='b')
plt.xlabel('Sum')
plt.ylabel('CDF')
plt.show()

CDF plot

Now, plot 2 observations in the same plot to see their differences:

2 observation plot

And now I want to get the max distance they have, so in which point they are furthest from each other.

JohanC
  • 71,591
  • 8
  • 33
  • 66
  • 1
    By CDF I assume you mean Cumulative Distribution Function? Can you not just subtract the two and look for the maximum? e.g. something like `imax = np.argmax(np.abs(y1-y2))`. (That'll tell you at which point, then you can just evaluate at that point to get the max distance.) – sh37211 Dec 14 '19 at 21:44
  • Yes i mean Cumulative Distribution Function? so as you can see on the last picture there are 2 lines of 2 seperate observations . now i have to find in which point they are farther with each other . i tried what you suggested but seems not working on my case . thanks – user12538312 Dec 14 '19 at 22:43

1 Answers1

1

For measuring the distance between two such CDF plots as described in your criteria, you can use Kolmogorov–Smirnov test for equality between the two distributions. or you can use maximum point-wise distance of both CDFs. I think it might help.

Saad
  • 916
  • 1
  • 15
  • 28