String concatenation in Python list (of strings) vs. numpy array (of strings)

Question

For the purpose of my application, I can declare an array of strings in two ways:

As a list strArr1 = [""] * 5 or
As a numpy array strArr2 = numpy.empty([5], dtype=str)

However, I see the following difference when I try to concatenate characters to array elements. In the first case, e.g.

strArr1[0] += 'a'
strArr1[0] += 'b'

gives me as expected ['ab', '', '', '', ''].

In the second case however,

strArr2[0] += 'a'
strArr2[0] += 'b'

gives me the result ['a', '', '', '', ''].

Why is concatenation not working as expected for the elements of numpy array? Also, given that I have the constraint that I must extend the elements of my array one character at a time, could anyone suggest an efficient and pythonic approach?

Thanks.

What else are you doing with these lists or arrays? So far I don't see a good reason to use the array form. — hpaulj, Feb 13 '17 at 05:19

Neo X · Answer 1 · 2017-02-13T05:21:11.973

0

Numpy requires string arrays to have a fixed maximum length. You can use strArr2 = numpy.empty([5], dtype='S10'), where 10 is the maximum of string each item can hold, longer substring will be truncated.

Or strArr2 = numpy.empty([5], dtype=object) that will enable you to store arbitrary python object into the array, including the string.

See Data type objects (dtype).
To improve the efficiency of extending string characters, you may try to use a Python list as the data type, then append each new character to the list. After all characters are finalized, use join to convert list back to string.

edited Feb 13 '17 at 05:21

answered Feb 13 '17 at 05:15

Neo X

947
7
9

If I declare `strArr2 = numpy.empty([5], dtype='S10')`, each element is of type `numpy.bytes_`, and then I cannot concatenate chars/str to these elements. – N. CHATURV3DI Feb 13 '17 at 05:27
1

On Py3 try `arr=np.zeros((5,), dtype='U10')` - unicode is standard on py3. Or mark your addition as byte string, e.g. `arr[0] += b'abc'` – hpaulj Feb 13 '17 at 05:31
It works well for me (`Numpy 1.11.2 with Python 2.7.12`), and `type(strArr2[0])` gives ``. Anyway, using `dtype=object` or method 2 would be more appropriate. – Neo X Feb 13 '17 at 05:33

score 0 · Answer 2 · answered Feb 13 '17 at 06:12

Declaring numpy.empty with dtype='U10' worked, although without marking my additions as bytes, i.e. w/o b'abc', which fails.

Finally, for the sake of efficiency, I will follow Neo X's suggestion, which should avoid distribution specific anomalies in the behaviour.

P.S. I am using Numpy 1.10.4 with Python 3.5.1.

String concatenation in Python list (of strings) vs. numpy array (of strings)

2 Answers2