6

I am using the numpy library in Python to import CSV file data into a ndarray as follows:

data = np.genfromtxt('mydata.csv', 
                     delimiter='\,', dtype=None, names=True)

The result provides the following column names:

print(data.dtype.names)

('row_label',
 'MyDataColumn1_0',
 'MyDataColumn1_1')

The original column names are:

row_label, My-Data-Column-1.0, My-Data-Column-1.1

It appears that NumPy is forcing my column names to adopt C-style variable name formatting. Yet there are many cases where my Python scripts require access to columns according to column name, so I need to ensure that column names remain constant. To accomplish this either NumPy needs to preserve the original column names or else I need to convert my column names to the format NumPy is using.

  • Is there a way to preserve the original column names during import?

  • If not, is there an easy way to convert column labels to use the format NumPy is using, preferably using some NumPy function?

holocronweaver
  • 2,171
  • 3
  • 18
  • 20
  • See here: http://stackoverflow.com/questions/14429992/can-i-rename-fields-in-a-numpy-record-array – RidingTheRails Apr 15 '13 at 16:26
  • @RichardHollis This is not the same question - I can already import column names, but I want to preserve their formatting. Perhaps I should modify the question title. – holocronweaver Apr 15 '13 at 18:26

1 Answers1

5

if you set names=True, then the first line of your data file is passed through this function:

validate_names = NameValidator(excludelist=excludelist,
                               deletechars=deletechars,
                               case_sensitive=case_sensitive,
                               replace_space=replace_space)

These are those options that you can supply:

excludelist : sequence, optional
    A list of names to exclude. This list is appended to the default list
    ['return','file','print']. Excluded names are appended an underscore:
    for example, `file` would become `file_`.
deletechars : str, optional
    A string combining invalid characters that must be deleted from the
    names.
defaultfmt : str, optional
    A format used to define default field names, such as "f%i" or "f_%02i".
autostrip : bool, optional
    Whether to automatically strip white spaces from the variables.
replace_space : char, optional
    Character(s) used in replacement of white spaces in the variables
    names. By default, use a '_'.

Perhaps you could try to supply your own deletechars string that is an empty string. But you'd be better off modifying and passing this:

defaultdeletechars = set("""~!@#$%^&*()-=+~\|]}[{';: /?.>,<""")

Just take out the period and minus sign from that set, and pass it as:

np.genfromtxt(..., names=True, deletechars="""~!@#$%^&*()=+~\|]}[{';: /?>,<""")

Here's the source: https://github.com/numpy/numpy/blob/master/numpy/lib/_iotools.py#l245

askewchan
  • 45,161
  • 17
  • 118
  • 134
  • I appreciate directly linking the relevant source code and adding a better alternative to an empty string for deletechars. Works just as you suggested. Thanks! – holocronweaver Apr 15 '13 at 17:42
  • You're welcome, glad it works for your case. You probably know this, but going into the future, your code will be more robust if you can try to keep your column names to be simpler and avoid the commonly prohibited characters. – askewchan Apr 15 '13 at 21:20