4

I am trying to load a csv file consisting just from float types.

data = np.genfromtxt(self.file,dtype=float,delimiter=self.delimiter,names = True)

but this returns an array of tuples. Based on my search this should return tuples only for non-homogenous arrays. numpy.genfromtxt produces array of what looks like tuples, not a 2D array—why?. When I remove the names=True, it really does return an 2d array. Is it possible to return an array with names as it is in the link?

Lines from the csv:

0 _id|1 age|2 unkown|3 male|4 female|5 match-start|6 score
8645632250|7744|0|1|0|1|10

(there is more columns, I just wrote the first six of them.)

I also used this code for better names of columns:

def obtain_data(self):
with open(self.file, 'r') as infile:
  first_line = infile.readline()
  labels = first_line.split('|')
  labels = list(map(trunc_before,labels))
  data = np.genfromtxt(self.file,dtype=float,delimiter=self.delimiter,names = labels,skip_header=1)
  return data,  np.asarray(labels)
Community
  • 1
  • 1
Pter
  • 127
  • 3
  • 8

1 Answers1

6

It sounds like you're asking whether it's possible to have a standard 2d array while also having named columns. It isn't. (At least not in the sense you seem to be asking.)

An "array with names" is a structured array -- it's an array of records (not really tuples), each of which has named fields. Think of it this way: the names aren't attached to the array, they're attached to the "tuples" -- the records. The fact that the data is of a homogenous type doesn't matter.

senderle
  • 145,869
  • 36
  • 209
  • 233
  • Ok, I knew that about array of records etc but still I somewhat thought that is is possible to have an array with column names. It seems that if I want named array I need to write my own class or use [http://pandas.pydata.org/](Pandas). Btw, do you know if is there any difference in performance between structured arrays and normal arrays in numpy? – Pter Sep 04 '13 at 12:06
  • Yes I think Pandas provides support for things like that. Record arrays should perform about as well as regular arrays for most things. There are actually two closely-related forms -- record arrays and structured arrays -- and you can read about the difference [here](http://wiki.scipy.org/Cookbook/Recarray). The main speed concern that I'm aware of involves attribute access to record arrays (which isn't possible with structured arrays). If you have a record array with a 'age' field, you can access it like this: `myarray['age']` _or_ `myarray.age`. But the latter can be slow. – senderle Sep 04 '13 at 12:58