0

I am trying to plot a CCDF using numpy and input is csv with #keywords as col[0] and frequency as col[1].

Input

#Car,45
#photo,4
#movie,6
#life,1

Input has more than 10K rows and two column out of which col[0] is not used at all and only the frequency from col[1] is used to plot the CCDF. The data has no empty rows in-between nor eof has any blank row.

Code:

import numpy as np
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.pyplot as plt
from pylab import*
import math
from matplotlib.ticker import LogLocator

data = np.genfromtxt('input.csv', delimiter=",")

d0=data[:,1]
X0 = np.sort(d0)
cdf0 = np.arange(len(X0))/float(len(X0))
#cumulative = np.cumsum(data)
ccdf0 = 1 - cdf0
plt.plot(X0,ccdf0, color='b', marker='.', label='Frequency')

plt.legend(loc='upper right')
plt.xlabel('Freq (x)')
plt.ylabel('ccdf(x)')
plt.gca().set_xscale("log")
#plt.gca().set_yscale("log")
plt.show()

Error

Traceback (most recent call last):
  File "00_plot_ccdf.py", line 17, in <module>
    d0=data[:,1]
IndexError: too many indices for array

Thanks in Advance

Sitz Blogz
  • 1,061
  • 6
  • 30
  • 54

1 Answers1

2

genfromtxt by default treats lines starting with # as comments, so actually your data is empty:

In [1]: genfromtxt('test.csv', delimiter=',')         
/usr/lib/python3/dist-packages/numpy/lib/npyio.py:1385: UserWarning: genfromtxt: Empty input file: "test.csv"
  warnings.warn('genfromtxt: Empty input file: "%s"' % fname)
Out[1]: array([], dtype=float64)

data is a 1-dimensional empty array and so [:,1] is too many indices.

To disable this pass comments=None to genfromtxt:

In [20]: genfromtxt('test.csv', delimiter=',', comments=None)
Out[20]: 
array([[ nan,  45.],
       [ nan,   4.],
       [ nan,   6.],
       [ nan,   1.]])

Since you need only the 2. column, you can also limit the results to that directly:

In [21]: genfromtxt('test.csv', delimiter=',', comments=None, usecols=(1,))
Out[21]: array([ 45.,   4.,   6.,   1.])
Ilja Everilä
  • 50,538
  • 7
  • 126
  • 127
  • Thank you so much .. Let me try this and will reply back .. Thanks again ! :) – Sitz Blogz Apr 14 '16 at 08:51
  • Works like a charms.. Thank you so much .. been searching a lot. And may be I could use a little suggestion on this one too. http://stackoverflow.com/questions/36616118/seasborn-distplot-goes-unresponsive – Sitz Blogz Apr 14 '16 at 09:00
  • 1
    @SitzBlogz ipython has a nice feature: type a question mark after a class, function etc. and enter (`genfromtxt?` for example), and it'll display helpful information. This is especially nice for numpy, scipy etc. stuff, as it prints a very matlab like help text. – Ilja Everilä Apr 14 '16 at 09:08