2

Similar to this question, numpy.genfromtxt modifies my columns' names:

import numpy as np
from io import BytesIO  # https://stackoverflow.com/a/11970414/321973

str = 'x,-1,1\n0,1,1\n1,2,3'
data = np.genfromtxt(BytesIO(str.encode()), delimiter=',', names=True)
print(data.dtype.names)

yields ('x', '1', '1_1') instead of the desired ('x', '-1', '1') (or even better, ('x', -1, 1)). I tried deletechars="""~!@#$%^&*()=+~\|]}[{';: /?>,<""" as suggested there to no avail.

Community
  • 1
  • 1
Tobias Kienzler
  • 25,759
  • 22
  • 127
  • 221
  • 1
    I think the column names have to be valid identifiers, which `-1` isn't. – jonrsharpe Mar 17 '15 at 11:31
  • Ultimately, I'd like to obtain a `np.meshgrid` by the way, so please go ahead and stop me from an XY-Problem ;) – Tobias Kienzler Mar 17 '15 at 11:32
  • @jonrsharpe You mean not all strings are valid? Is there a list of valid identifiers? – Tobias Kienzler Mar 17 '15 at 11:33
  • https://docs.python.org/3/reference/lexical_analysis.html#identifiers – jonrsharpe Mar 17 '15 at 11:35
  • but those are python identifiers, can't the `dtype.names` be any string just like a `dict` can be constructed as `dict('-1': -1, '1': 1)`? – Tobias Kienzler Mar 17 '15 at 11:37
  • 2
    I'm not certain, but [this doc](http://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html) notes *"the “name” which must be a valid Python identifier"* – jonrsharpe Mar 17 '15 at 11:40
  • I see, thanks. I wonder why... Anyway, that makes my question rather inanswerable :/ – Tobias Kienzler Mar 17 '15 at 11:41
  • Presumably so you can access by attribute, or for some internal implementation reasons. And yes; it does, rather! – jonrsharpe Mar 17 '15 at 11:42
  • 1
    @jonrsharpe No, they don't need to be valid identifiers: `x = np.array((1,), dtype=[('-1', 'i')]); x['-1']` works perfectly fine – ali_m Mar 17 '15 at 12:57
  • @ali_m in that case, I'm stumped. – jonrsharpe Mar 17 '15 at 13:16
  • The reason why field names are 'mangled' is that you could ask for your structured array to be transformed into a [recarray](http://docs.scipy.org/doc/numpy/reference/generated/numpy.recarray.html) (where fields can be accessed as attributes). Such behavior would obviously fail if there's a non-alphanumerical character in your field name. Hence the preparsing by `NameValidator`. – Pierre GM Mar 17 '15 at 15:09
  • @PierreGM Those are *still* legal field names for a `recarray`! The dot-indexing syntax of course won't work, but `dict`-style indexing is fine: `x.view(np.recarray)['-1']`. – ali_m Mar 17 '15 at 15:17

1 Answers1

2

The behavior you're seeing is caused by the fact that np.genfromtxt uses the NameValidator class here to automatically strip certain non-alphanumeric characters from the field names.

It's perfectly legal for a field name to contain a '-' character, e.g.:

x = np.array((1,), dtype=[('-1', 'i')])
print(x['-1'])
# 1

In fact, two out of three of the modified field names you get back from np.genfromtxt are also not "valid Python identifiers" ('1' and '1_1', since they start with digits).

It's therefore possible to construct the array you describe as long as you bypass using np.genfromtxt to set the field names. One way to do it would be to initialize an empty array, specify the field names and dtypes explicitly, then fill it with the rest of the string contents:

names = str.splitlines()[0].split(',')
types = ('i',) * 3
dtype = zip(names, types)

data = np.empty(2, dtype=dtype)
data[:] = np.genfromtxt(BytesIO(str.encode()), delimiter=',', dtype=dtype,
                        skiprows=1)
print(repr(data))
# array([(0, 0, 1), (1, 0, 2)], 
#       dtype=[('x', '<i4'), ('-1', '<i4'), ('1', '<i4')])

However, just because you can doesn't mean you should - there may well be other unpredictable consequences to having a '-' in one of your field names. The safest option is to stick with using only valid Python identifiers as field names.

ali_m
  • 71,714
  • 23
  • 223
  • 298