Missing spaces in numpy array

Question

I am trying to read a unicode data file to a few lists. I have a mixed unicode/integer/float data file of this format:

Է   1335    1.1
դ   1380    1.2
    32  1.3
ն   1398    1.4
ե   1381    1.5
ր   1408    1.6

I am reading the file with numpy genfromtxt according to this question numpy.genfromtxt:

decodef = lambda x: x.decode("utf-8")
arr = np.genfromtxt("./data_files/data", delimiter="\t", dtype="U1, i4, f8", converters={0: decodef})

This gives me a numpy.ndarray not containing spaces, but empty elements for spaces in the first column:

('Է', 1335, 1.1)
('դ', 1380, 1.2)
('', 32, 1.3)
('ն', 1398, 1.4)
('ե', 1381, 1.5)
('ր', 1408, 1.6)

I have already tried to solve the space issue with autostrip=False (the default value), missing_values=" ", replace_space='_' parameters, but still get the same array with empty items for the spaces. I guess all this parameters are intended only for delimiter manipulation?!

Any ideas how to overcome this?

Python version 3.4.5 is being used.

What is the problem? This is a structured array. The empty string in the 3rd record? Given the dtype the array display looks normal. — hpaulj, Jan 03 '17 at 14:58
Yes, the empty string in the third record. For other symbols everything works as expected. Edited that part to clarify. ) — , Jan 03 '17 at 15:00
Some parameters apply to field names, not values. Is there a fill value parameter? — hpaulj, Jan 03 '17 at 15:12

score 1 · Answer 1 · answered Jan 03 '17 at 15:12

1

Apparently the genfromtxt method somehow removes the space.

If you use

decodef = lambda x: x.decode("utf-8") if x != '' else " "
arr = np.genfromtxt("text", delimiter="\t", dtype="U1, i4, f8",converters={0: decodef})

It works. I still do not exactly understand why though.

answered Jan 03 '17 at 15:12

Gauthier Feuillen

184
1
7

Hm. It didn't work for me. P.S. I am using Python 3.4.5. Don't know does this matter in this particular case or not. – Jan 03 '17 at 15:26
Indeed, did this using 2.7 and it is not working in 3.4.5 ^^ – Gauthier Feuillen Jan 03 '17 at 15:41
Obvious (but upmost ugly): decodef = lambda x: x.decode("utf-8") if x.decode("utf-8") != '' else " " – Gauthier Feuillen Jan 03 '17 at 15:53
Do you really need to use np.genfromtxt ? – Gauthier Feuillen Jan 03 '17 at 15:59
With separate decoding it works. :) I were not able to make it work in other ways. Even though after getting it done with `genfromtxt` haven't looked back to previous approaches once more. – Jan 03 '17 at 16:35

Missing spaces in numpy array

1 Answers1