1

I am a bit puzzled, because this worked previously when I run the code 1-2 years ago.

I have a large structured numpy array with different data types and column names. I can save it with numpy.savetxt() if I provide a format string (fmt) that describes the data types of each column. However, if I want to save a selection of just a few columns via array[['col_name1', 'col_name2']] together with the fmt string for the two columns, I get the following error message: ValueError: fmt has wrong number of % formats: %i %i

Here an example.

Saving the entire array works:

import numpy as np
arr = np.zeros(3, dtype=[('w', int), ('x', float), ('y', int), ('z', "i8")])
np.savetxt('works.txt', arr, fmt="%i %06f %i %i")

Saving two columns of it doesn't:

import numpy as np
arr = np.zeros(3, dtype=[('w', int), ('x', float), ('y', int), ('z', "i8")])
np.savetxt('ValueError.txt', arr[['w','y']], fmt="%i %i")

This gives me the error message:

ValueError: fmt has wrong number of % formats: %i %i

I have scripts that do exactly the same with large structured arrays and they worked when I used this 1-2 years ago.

I have no clue what is going on. After making the column selection, the array dtype object has the additional attributes offsets and itemsize. Does this cause the error?

In [131]: arr
Out[131]: 
array([(0, 0., 0, 0), (0, 0., 0, 0), (0, 0., 0, 0)],
     dtype=[('w', '<i8'), ('x', '<f8'), ('y', '<i8'), ('z', '<i8')])

In [132]: arr[['w','y']]
Out[132]: 
array([(0, 0), (0, 0), (0, 0)],
     dtype={'names':['w','y'], 'formats':['<i8','<i8'], 'offsets':[0,16], 'itemsize':32})

How can I fix this? Thank you!

DAH
  • 11
  • 1
  • Recent versions changed the multifield indexing. It now produces a view. `savetxt` tries to format a tuple version of a row, e.g. `tuple(arr[['w','y']][0])`. I haven't tried it myself, but am not surprised that it would give a problem. `recfunctions` has a `repack_fields` function that can make a clean copy. – hpaulj Apr 12 '20 at 00:02
  • https://stackoverflow.com/q/61115462/901925 – hpaulj Apr 12 '20 at 00:39

1 Answers1

0
In [66]: arr = np.zeros(3, dtype=[('w', int), ('x', float), ('y', int), ('z', "i8")])                  
In [67]: arr                                                                                           
Out[67]: 
array([(0, 0., 0, 0), (0, 0., 0, 0), (0, 0., 0, 0)],
      dtype=[('w', '<i8'), ('x', '<f8'), ('y', '<i8'), ('z', '<i8')])

It works for me:

In [71]: np.savetxt('ValueError.txt', arr[['w','y']], fmt="%i %i")                                     
In [72]: cat ValueError.txt                                                                            
0 0
0 0
0 0

I was suspecting savetxt's way of formatting a 'row':

In [73]: tuple(arr[['w','y']][0])                                                                      
Out[73]: (0, 0)

My version:

In [74]: np.__version__                                                                                
Out[74]: '1.18.2'

In any case it is possible to cleanup the field selection:

In [76]: import numpy.lib.recfunctions as rf                                                           
In [77]: rf.repack_fields(arr[['w','y']])                                                              
Out[77]: array([(0, 0), (0, 0), (0, 0)], dtype=[('w', '<i8'), ('y', '<i8')])
In [78]: np.savetxt('ValueError.txt', _77, fmt="%i %i")  

Reworking the multifield indexing occurred over several versions, so you might have one where it didn't work quite right.

hpaulj
  • 221,503
  • 14
  • 230
  • 353