0

I can read the data file fine, but as soon as I try to add the name parameter either by specifying the names myself or reading from the first row I get back empty strings

data_no_headers = genfromtxt('SimpleDataWithHeaders.csv',delimiter=',',dtype='str',autostrip=True)
print(data_no_headers)
data_with_headers = genfromtxt('SimpleDataWithHeaders.csv',delimiter=',',dtype='str',autostrip=True,names=True)
print(data_with_headers)
data_with_headers = genfromtxt('SimpleDataWithHeaders.csv',delimiter=',',skip_header=1,dtype='str',autostrip=True,names="A,B")
print(data_with_headers)
mycols = ['a','b']
data_with_headers = genfromtxt('SimpleDataWithHeaders.csv',delimiter=',',skip_header=1,dtype='str',autostrip=True,names=mycols)
print(data_with_headers)

If I execute this code I get the following output (I made a very simple csv file with three rows and a header row to illustrate the problem) you can see the output I get with each of the commands above. You can see it works fine until I add the names parameter

[['CODE' 'AIRPORT']
['HOU' 'Houston']
['ABQ' 'Alberquerque']
['BWI' 'Baltimore']]

[('', '') ('', '') ('', '')]

[('', '') ('', '') ('', '')]

[('', '') ('', '') ('', '')]
  • When using `names`, also specify `dtype=None`. `dtype=str` makes field dtype 'U', a 0 element string, hence the '' results). The problem would be more obvious if you looked at `data.dtype` or `print(repr(data))`. – hpaulj Jun 19 '19 at 16:37
  • It you don't need the structured array with fields, you could skip the header, and just get a (n,2) string dtype array. – hpaulj Jun 19 '19 at 17:09
  • Changing dtype=None and adding encoding=None does get the values read in, but now it is no longer creating multiple rows, I get a single row of pairs instead... – HockeyGeekGirl Jun 19 '19 at 18:33
  • data_with_headers = genfromtxt('SimpleDataWithHeaders.csv',delimiter=',',dtype=None, encoding=None,autostrip=True,names=True) print(data_with_headers) – HockeyGeekGirl Jun 19 '19 at 18:33
  • [('HOU', 'Houston') ('ABQ', 'Alberquerque') ('BWI', 'Baltimore')] – HockeyGeekGirl Jun 19 '19 at 18:33
  • That's a 1d structured array. The pairs are records. Look at `print(repr(data))`. If you want a 2d array of strings skip the header. The header only serves to provide names for the fields of a structured array. – hpaulj Jun 19 '19 at 18:39

1 Answers1

0

A simulated file:

In [243]: txt = """CODE, AIRPORT 
     ...: HOU, Houston 
     ...: ABQ, Alberquerque 
     ...: BWI, Baltimore"""                                                               

read without using the headers:

In [244]: data = np.genfromtxt(txt.splitlines(), delimiter=',', dtype=str, skip_header=1, 
     ...: encoding=True)                                                                  
In [245]: data                                                                            
Out[245]: 
array([['HOU', ' Houston'],
       ['ABQ', ' Alberquerque'],
       ['BWI', ' Baltimore']], dtype='<U13')

The result is a 2d array with a string dtype.

Using the header, and dtype=None:

In [246]: data = np.genfromtxt(txt.splitlines(), delimiter=',', dtype=None, names=True, en
     ...: coding=True)                                                                    
In [247]: data                                                                            
Out[247]: 
array([('HOU', ' Houston'), ('ABQ', ' Alberquerque'),
       ('BWI', ' Baltimore')],
      dtype=[('CODE', '<U3'), ('AIRPORT', '<U13')])
In [248]: data.shape                                                                      
Out[248]: (3,)
In [249]: data['CODE']                                                                    
Out[249]: array(['HOU', 'ABQ', 'BWI'], dtype='<U3')

The result is a structured array - 1d with 2 fields, which are accessed by name.

With str dtype, it is also structured, but the dtype is 'U', a 0 byte string, hence the empty string display:

In [250]: data = np.genfromtxt(txt.splitlines(), delimiter=',', dtype=str, names=True, enc
     ...: oding=True)                                                                     
In [251]: data                                                                            
Out[251]: 
array([('', ''), ('', ''), ('', '')],
      dtype={'names':['CODE','AIRPORT'], 'formats':['<U','<U'], 'offsets':[0,0], 'itemsize':2})

Plain print omits the dtype, with possible confusion:

In [252]: print(data)                                                                     
[('', '') ('', '') ('', '')]
hpaulj
  • 221,503
  • 14
  • 230
  • 353