1

I want to read file as ndarray with variable number of rows and 6 columns and the first row is the names of columns. I used this

data = np.genfromtxt('attack-traffic.csv', dtype=float, delimiter=',', names=True)

but when I

print data.shape

It gives me

(1680,)

how can I do it to read 6 columns?

  • By passing in `names=True`, you're creating a [structured array](http://docs.scipy.org/doc/numpy-1.10.1/user/basics.rec.html). Your 1680 records will each have 6 fields. – Oliver W. May 25 '16 at 10:31
  • @OliverW. Then why the shape doesn't give me (1680,6)?? I'm new with Numpy – Eman Bany salameh May 25 '16 at 10:35

2 Answers2

0

By passing in names=True to genfromtxt, you're creating a structured array. Your 1680 records will each have 6 fields.

Example:

oliver@armstrong:/tmp$ cat sto.txt 
id,num
1,1.2
2,2.4
oliver@armstrong:/tmp$ python
Python 2.7.3 (default, Jun 22 2015, 19:33:41) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> a = np.genfromtxt('/tmp/sto.txt', names=True, delimiter=',')
>>> a.shape
(2,)
>>> a[0]  # each record has 2 fields
(1.0, 1.2)
>>> a[0].dtype
dtype([('id', '<f8'), ('num', '<f8')])
>>> a[0]['num']
1.2

You can also access just those fields, and it will tell you how many elements there are per field:

>>> a['num'].shape
(2,)
>>> a['num']
array([ 1.2,  2.4])

Definitely read the documentation on structured arrays if you want to know more. The provided link is full of good examples.

If you know a priori that all elements in that file are floats (and by passing dtype=float, you indicate that you know), you could convert the structured array to a normal ndarray:

>>> a = np.genfromtxt('/tmp/sto.txt', names=True, delimiter=',', dtype=float) # added `dtype=float` in the call
>>> b = a.view(np.float64).reshape(a.shape[0], -1)
>>> b
array([[ 1. ,  1.2],
       [ 2. ,  2.4]])

Note that this returns a view, so any change you make to a will be reflected in b and vice versa.

Oliver W.
  • 13,169
  • 3
  • 37
  • 50
  • Ok got it. but why when I calling this plt.plot(data[:,0],data[:,1] ,'ob',alpha=0.2, markersize=4) it gives me IndexError: too many indices – Eman Bany salameh May 25 '16 at 10:45
  • @EmanBanysalameh When you have a structured array, you can simply call the columns by passing the names (that were obtained from the file). Just call `plt.plot(data['field1'], data['field2'])` where `field1` and `field2` are the names of the columns in the file. In my example, I would have used `plt.plot(a['id'], a['num'])`. – Oliver W. May 25 '16 at 10:48
0

Try pandas:

import pandas as pd
df = pd.read_csv('attack-traffic.csv')
print df.columns
print df.head()

Then check with Numpy..

import numpy as np
data = np.genfromtxt('attack-traffic.csv', dtype=float, delimiter=',')

Try: dtype=str

Merlin
  • 24,552
  • 41
  • 131
  • 206