3

I try to convert a list of astropy Table in a numpy array of astropy Table. In first instance I tried np.asarray(list) and np.array(list) but the astropy table inside the list were converted with the list as numpy ndarray.

Example :

t = Table({'a': [1,2,3], 'b':[4,5,6]})  
t2 = Table({'a': [7,8,9], 'b':[10,11,12]})
mylist = [t1, t2]
print(mylist)

The output is:

[<Table length=3>
  a     b
int64 int64
----- -----
    1     4
    2     5
    3     6, 
<Table length=3>
  a     b
int64 int64
----- -----
    7    10
    8    11
    9    12]

Then if I apply np.array() the output is :

array([[(1,  4), (2,  5), (3,  6)],
       [(7, 10), (8, 11), (9, 12)]], dtype=[('a', '<i8'), ('b', '<i8')])

but I want the following:

array([<Table length=3>
  a     b
int64 int64
----- -----
    1     4
    2     5
    3     6, 
<Table length=3>
  a     b
int64 int64
----- -----
    7    10
    8    11
    9    12])

My actual solution is :

if isinstance(mylist, list):
    myarray = np.empty(len(mylist), dtype='object')
    for i in range(len(myarray)):
        myarray[i] = mylist[i]
else:
    myarray = mylist
return myarray

It works but I was thinking that there is maybe something built-in in numpy to do this, but I can't find it.

2PiOmega
  • 45
  • 3
  • Will you please share the output of the code and your desired output? – Behdad Abdollahi Moghadam Oct 02 '21 at 07:30
  • Why do you want to change a list of Tables into an array of Tables? Generally, that will not be where any speed-up is, and a list is a perfectly fine data structure to hold your Tables. – 9769953 Oct 02 '21 at 07:43
  • I've updated the question with an example. I want to do this change not for speed up but to apply a boolean selection to select some of these table. But since the boolean indexing of list doesn't work, the easiest solution I know is using numpy array. – 2PiOmega Oct 02 '21 at 07:54
  • Unless your list is very long, I would stick to the list, for clarity. If you show what you actually want to do, there is probably an answer that works just as well for lists. (Since it's unclear how you want to do the boolean selection precisely.) – 9769953 Oct 02 '21 at 07:59
  • I want to do something as i have a boolean array, name it selection, of the same len as the list of astropy Table. I want to do `my_select_list = mylist[selection]`. – 2PiOmega Oct 02 '21 at 08:11
  • OK, I have a fix that allows `np.array([t1, t2], dtype=object)` to give the expected result from the original post. At least for the next astropy release this will fix the issue. – Tom Aldcroft Oct 02 '21 at 12:14

1 Answers1

1

This looks to be an Astropy Table limitation, which I would consider a bug: Astropy's Table will prevent coercion to a NumPy array, since that doesn't always work: there is a specific check in the code that will raise a ValueError if there is a dtype specified when attempting to convert a table to a NumPy array.

Of course, here you are dealing with a list. But now you run into two issues: NumPy will attempt to convert the list to an array, and apply transformation of each individual element. You either get a 2D array with no dtype specified, or again, the ValueError with dtype specified:

ValueError: Datatype coercion is not allowed

The bug (as I consider it) is that Astropy checks for a dtype anything other than None. So even object as a dtype will raise this error, which I'm not sure it should.

Your work-around is therefore, in my opinion, fine. Not ideal, but it does the job, and it's basically just 2-3 lines of code.


Since, however, you mention boolean indexing, consider the following, while keeping everything in a list (which I think here is the better option: NumPy arrays are really meant for numbers, not so much objects):

indices = [True, False, True, False]
my_list = [....]  # list of tables
selection = [item for item, index in zip(my_list, indices) if index]  # filter all True values

or for numbered indices:

indices = [1, 3, 5, 6]
my_list = [....] # list of tables
selection = [my_list[i] for i in indices]

Same amount of lines as with NumPy indexing, and unless your list grows to thousands (millions) of elements, you wouldn't notice a performance difference. (If it does grow to millions of elements, you may need to reconsider your data structures anyway, which requires more rewriting elsewhere in your code.)

9769953
  • 10,344
  • 3
  • 26
  • 37
  • Interesting and thanks for the insight. It's worth noting that pandas DataFrame also does not easily support making a numpy object array of dataframes. It is surprising to me that numpy is calling `Table.__array__` on each of the two list elements in `np.array([t1, t2])`. Anyway, I'll see if there is some simple fix, but agreed with your suggestions on using plain Python lists. – Tom Aldcroft Oct 02 '21 at 11:12
  • @TomAldcroft Thanks for the response Tom. I agree it may be an issue with NumPy (as well), but I think that choice was made for ease of converting or combining arrays. You may already have seen the issue filed at https://github.com/astropy/astropy/issues/12229 . – 9769953 Oct 02 '21 at 16:40