The string type in the parameter must match the type in the array:
In [44]: ff = np.array([['a:bc','d:ef'],['g:hi','j:kl']])
In [45]: ff
Out[45]:
array([['a:bc', 'd:ef'],
['g:hi', 'j:kl']], dtype='<U4')
In [46]: np.char.split(ff,':')
Out[46]:
array([[list(['a', 'bc']), list(['d', 'ef'])],
[list(['g', 'hi']), list(['j', 'kl'])]], dtype=object)
In [47]: np.char.split(ff.astype('S5'),b':')
Out[47]:
array([[list([b'a', b'bc']), list([b'd', b'ef'])],
[list([b'g', b'hi']), list([b'j', b'kl'])]], dtype=object)
'U4' is unicode, the default string type for Py3. 'S4' is bytestring, the default type for Py2. b':'
is a bytestring, u':'
is unicode.
This np.char.split
is a bit awkward to use, since the result is object dtype, with lists of the split strings.
To get 2 separate arrays I'd use frompyfunc
to apply an unpacking:
In [50]: np.frompyfunc(lambda alist: tuple(alist), 1,2)(_46)
Out[50]:
(array([['a', 'd'],
['g', 'j']], dtype=object), array([['bc', 'ef'],
['hi', 'kl']], dtype=object))
In [51]: np.frompyfunc(lambda alist: tuple(alist), 1,2)(_47)
Out[51]:
(array([[b'a', b'd'],
[b'g', b'j']], dtype=object), array([[b'bc', b'ef'],
[b'hi', b'kl']], dtype=object))
though to get string dtype arrays I'd still have use astype
:
In [52]: _50[0].astype('U4')
Out[52]:
array([['a', 'd'],
['g', 'j']], dtype='<U4')
I could combine the unpacking and astype with np.vectorize
by providing otypes
(even a mix of dtypes!):
In [53]: np.vectorize(lambda alist:tuple(alist), otypes=['U4','S4'])(_46)
Out[53]:
(array([['a', 'd'],
['g', 'j']], dtype='<U1'), array([[b'bc', b'ef'],
[b'hi', b'kl']], dtype='|S2'))
Usually frompyfunc
is faster than vectorize
.
This unpacking won't work if the split creates different length lists:
In [54]: ff = np.array([['a:bc','d:ef'],['g:hi','j:kl:xyz']])
In [55]: np.char.split(ff,':')
Out[55]:
array([[list(['a', 'bc']), list(['d', 'ef'])],
[list(['g', 'hi']), list(['j', 'kl', 'xyz'])]], dtype=object)
===
With a chararray
, all these np.char
functions are available as methods.
In [59]: np.char.asarray(ff)
Out[59]:
chararray([['a:bc', 'd:ef'],
['g:hi', 'j:kl:xyz']], dtype='<U8')
In [60]: np.char.asarray(ff).split(':')
Out[60]:
array([[list(['a', 'bc']), list(['d', 'ef'])],
[list(['g', 'hi']), list(['j', 'kl', 'xyz'])]], dtype=object)
See the note in the np.char
docs:
The chararray
class exists for backwards compatibility with
Numarray, it is not recommended for new development. Starting from numpy
1.4, if one needs arrays of strings, it is recommended to use arrays of
dtype
object_
, string_
or unicode_
, and use the free functions
in the numpy.char
module for fast vectorized string operations.