0

I'm trying to implement the empirical distribution function from a paper that has the MATLAB implementation on page 3. Here's my Python version of it.

I converted it according to the NumPy for MATLAB users documentation while taking into account how statsmodels implements ECDF

from statsmodels.distributions.empirical_distribution import ECDF     

def ecdf_representation(D, n):
     """calculate ECDF from D at n points"""
     m = np.mean(D)
     X = []
     for d in xrange(D.shape[0] + 1):
         func = ECDF([D[:, d] + np.random.randn(np.shape(D[:, d])) * 0.01])
         ll = func(np.linspace(0, 1, n))
         X = [X, ll]
     X = [X, m]
     plt.plot(X)
     plt.show()
     return X

I get the error:

line 25, in ecdf_representation
func = ECDF([D[:, d] + np.random.randn(np.shape(D[:, d]))])
IndexError: too many indices for array

Doesn't D.shape[0] return the number of columns? So, D[:, d] should work right? What's going on here?

1 Answers1

0

D.shape[0] will return rows, not columns:

What does .shape[] do in "for i in range(Y.shape[0])"?

D.shape[1] will return columns

Flynn
  • 269
  • 2
  • 12
  • That solved the error in the OP, but now I get `IndexError: tuple index out of range`. Any thoughts on why that's happening? Also, thanks for the clarification. –  Jun 30 '17 at 13:37
  • The line it happens on is `for d in xrange(D.shape[1] + 1):` –  Jun 30 '17 at 13:40
  • Try range instead of xrange. Or are you using a huge amount of data? – Flynn Jun 30 '17 at 13:45
  • I still get `for d in range(0, D.shape[1] + 1): IndexError: tuple index out of range`. The data frame I'm passing in has about 554 rows. 5 columns though. –  Jun 30 '17 at 13:47
  • Printing `D.shape[1]` returns `IndexError: tuple index out of range` as well. –  Jun 30 '17 at 13:49
  • 1
    You'll want to verify that D is a matrix first. And then don't run it through a function to test, but just print D and then print D.shape[1] to test. – Flynn Jun 30 '17 at 13:51
  • It's actually an array. I changed it to `D.shape[0]` and I get `func = ECDF(([D[:, d] + np.random.randn(np.shape(D[:, d])) * 0.01])) IndexError: too many indices for array` –  Jun 30 '17 at 14:15
  • 1
    The matlab code was written for a matrix. So you might need to look things over. Even if you don't want it to be a matrix in the end, it might be best to get it working the same way the matlab code works, and then go back and make edits – Flynn Jun 30 '17 at 14:18
  • I changed some of my other functions to pass in a matrix. However, the line `func = ECDF(([D[:, d] + np.random.randn(np.shape(D[:, d])) * 0.01]))` now gets `TypeError: 'tuple' object cannot be interpreted as an index`. I'm guessing `np.shape(D[:, d]))` returns a tuple. How would I change that to conform to Python? –  Jun 30 '17 at 14:52
  • Yeap, so now you want to change stuff to work for a matrix and not an index. So change to" – Flynn Jun 30 '17 at 15:09
  • for d in ... remove range – Flynn Jun 30 '17 at 15:10