3

I have a 2D NumPy array of structs:

arr = np.zeros((3,5), [('x',int), ('y',float)])

That is:

array([[(0, 0.0), (0, 0.0), (0, 0.0), (0, 0.0), (0, 0.0)],
       [(0, 0.0), (0, 0.0), (0, 0.0), (0, 0.0), (0, 0.0)],
       [(0, 0.0), (0, 0.0), (0, 0.0), (0, 0.0), (0, 0.0)]], 
      dtype=[('x', '<i8'), ('y', '<f8')])

I want to create a Pandas Panel from it. I tried the obvious:

pd.Panel(arr)

ValueError: The number of dimensions required is 3, but the number of dimensions of the ndarray given was 2

Then I discovered this hideous pile:

pd.Panel(dict(enumerate(pd.DataFrame(a) for a in arr)))

It produces:

<class 'pandas.core.panel.Panel'>
Dimensions: 3 (items) x 5 (major_axis) x 2 (minor_axis)
Items axis: 0 to 2
Major_axis axis: 0 to 4
Minor_axis axis: x to y

This "works" but is very inefficient and an eyesore.

How are such Panels meant to be constructed?

Edit: I submitted an issue here: https://github.com/pandas-dev/pandas/issues/14511

John Zwinck
  • 239,568
  • 38
  • 324
  • 436
  • What is the final shape you're after? something like `pd.Panel(arr.reshape((1, arr.shape[0], arr.shape[1])))` or `pd.Panel(arr.reshape(( arr.shape[0], arr.shape[1],1)))`? – EdChum Oct 26 '16 at 09:56
  • @EdChum: The final shape given by the hideous pile I wrote in the question is OK. The code you wrote does produce Panels, but they are full of NaNs instead of the data from `arr`!! I'll update the question to show the results of the hideous pile. – John Zwinck Oct 26 '16 at 10:00

1 Answers1

3

You need to provide a 3-D array corresponding to the items, major and minor axes of the panel object.

# minor axis corresponds to the dtype names of the array initialized with zeros
dtyp = np.array(arr.dtype.names)
# dimensions to be included 
dim = arr.shape[0], arr.shape[1], dtyp.shape[0]
# Flatten the array and reshape it according to the aforementioned dimensions
panel = pd.Panel(pd.DataFrame(arr.ravel()).values.reshape(dim), minor_axis=dtyp)

gives:

<class 'pandas.core.panel.Panel'>
Dimensions: 3 (items) x 5 (major_axis) x 2 (minor_axis)
Items axis: 0 to 2
Major_axis axis: 0 to 4
Minor_axis axis: x to y

To convert it to a DF, simply use the to_frame method, like so:

panel.to_frame()

Image

Timings:

Image

Nickil Maveli
  • 29,155
  • 8
  • 82
  • 85
  • Your way is indeed faster, but it is even less succinct than my original. You mention that a Panel requires a 3D array but clearly that is exactly what a 2D structured array is. After all, a DataFrame can be constructed from a 1D structured array. I guess this is just a shortcoming of the Panel constructor. – John Zwinck Oct 26 '16 at 12:49
  • Yeah, I do agree with that. Currently panel objects are low on features compared to it's dataframe/series counterparts. There must be a way in future to deal with the 3 axes numpy array construction. – Nickil Maveli Oct 26 '16 at 12:54
  • I just realized the other problem with your solution: it changes all the item types to float! I need to preserve the original dtypes, because in practice I also use bools, strings, datetimes, etc. – John Zwinck Oct 27 '16 at 02:26
  • The behavior is justified in the sense that the panel object constructed from your starting array contains both `int` and `float` values in the same column(due to the multi-index created by minor axis). In such situations, the `dtypes` will be inferred as floats because of the mixing of types. Hence, you get the `dtypes` for all the item axis as `float64`. Also, the same behavior is observed while using your original function. – Nickil Maveli Oct 27 '16 at 16:35
  • 1
    It turns out the behavior is not justified at all...if you look at the response to the issue I posted on GitHub (and added a link in the question), you will see that the maintainers of Pandas say Panel is deprecated and not being maintained, and people should switch to xarray (a totally different library from Pandas). Bizarre. – John Zwinck Oct 28 '16 at 13:11