1

I'm trying to read CSV with the following line:

raw_data = genfromtxt(datafile,delimiter='\t',dtype=None)

OK, this function reads this file into Record Array when it meets string data in the datafile. as far as I understand, when dtype is None, file should be read into Record Array too. Is that correct?

However, if there is no string data and only numeric one is presented, this function reads data into ndarray.

If no, is there a convenient way to force this function read file as record array?

The problem with ndarray is that all my code is built in order to process record arrays.

UPD1 Just in case someone will try to do it, here is a brief solution. Possibly this one is not the best, but at least it works:

Read file from csv as an ndarray raw_data = genfromtxt(datafile,delimiter='\t',dtype=None)

Generate default names and datatypes for columns:

names_=['f'+str(i) for i in range(raw_data.shape[1])];
names=[(name,raw_data.dtype) for name in names_];

And finaly, to create record array:

raw_data_as_ra = raw_data.ravel().view(names);
drsealks
  • 2,282
  • 1
  • 17
  • 34
  • Just specify the desired dtype maybe? – Lev Levitsky Apr 14 '14 at 09:08
  • Every time I read different CSV files - I can have thousands of columns and I don't know for sure which data I will meet in the file. – drsealks Apr 14 '14 at 09:10
  • And what exactly is the problem with the ndarray? Is it that it converts ints to floats? or am I missing something bigger? – Lev Levitsky Apr 14 '14 at 09:13
  • Sorry, I've forgot to mention that all my further analysis of this file is built around record arrays in order to capture general case, when not only numeric data is presented. – drsealks Apr 14 '14 at 09:18
  • Maybe it's worth showing what exactly doesn't work in your processing code. – Lev Levitsky Apr 14 '14 at 09:20
  • Well, for example ndarray does not have columns names on default. It seems like the easiest way here is to generate this names manually and convert ndarray to record one. – drsealks Apr 14 '14 at 09:21
  • Well, `genfromtxt` has a `names` parameter, if that helps. – Lev Levitsky Apr 14 '14 at 09:24
  • Yes, I know. But again, I don't know how much columns will I have before reading the file. That means that I can't generate names array before reading the file. And that means, that I'm likely have to manually convert ndarray to the record array. Anyway, thanks for your suggestions. – drsealks Apr 14 '14 at 09:27
  • I'm having trouble imagining how you can use the field names in your code if you don't know them in advance and they're not in the file. Maybe if you show your code you get more accurate suggestions. – Lev Levitsky Apr 14 '14 at 09:30
  • If this function reads csv file that contains strings, it will generate fields names automatically in the form ['f0','f1',...,'fn']. I've already found a way to convert it to record array. – drsealks Apr 14 '14 at 09:34

1 Answers1

3

You could use recfromcsv, which is derived from genfromtxt, instead:

If your file looks like:

col1,col2,col3
1.1, 2.4, 3.2
4.1, 5.2, 6.3

Then do this

a = np.recfromcsv('yourfile.csv')

gives:

rec.array([(1.1, 2.4, 3.2), (4.1, 5.2, 6.3)], 
      dtype=[('col1', '<f8'), ('col2', '<f8'), ('col3', '<f8')])

Note that recfromcsv uses the first row as column/record names.

Also, you can use the same input parameters as genfromtxt (e.g. the delimiter parameter). Your line of code might look like this if your file is tab delimited:

np.recfromcsv(datafile,delimiter='\t'))
Lee
  • 29,398
  • 28
  • 117
  • 170
  • Thanks! I almost forgot about this function. Hope this function is able to read csv without names. Thanks again! – drsealks Apr 16 '14 at 04:53