Converting pandas.Multindex to numpy.ndarray with dtype float

Question

When converting a pandas.Multiindex to a numpy.ndarray, the output is a one dimensional ndarray with dtype=object as seen in the following example:

df = pd.DataFrame({
    'A': [10, 20, 30, 40, 50, 60],
    'B': [0,1,2,3,4,5],
    'C': ['K0', 'K1', 'K2', 'K3', 'K4', 'K5']
}).set_index(['A','B'])

The df will be:

The output for df.index.to_numpy() is a one dimensional ndarray with dtype=object:

array([(10, 0), (20, 1), (30, 2), (40, 3), (50, 4), (60, 5)], dtype=object)

but I want:

array([[10,  0],
       [20,  1],
       [30,  2],
       [40,  3],
       [50,  4],
       [60,  5]])

np.vstack(df.index)

Is there any more direct or better solution?

What do you mean by better? Isn't `np.vstack(df.index)` precisely the desired output? — fsl, Mar 03 '21 at 01:27
Yeah, current solution seems fine, but I was wondering if there is any case that my solution won't work or if pandas can give me the correct output without the need to do np.vstack. — Ali_MM, Mar 04 '21 at 19:08
I was also thinking there can be a downside to my method, compared to, say, @delimiter's solution below (in terms of type conversion or what not), so I thought I can have some people doublecheck it. — Ali_MM, Mar 04 '21 at 19:16

score 2 · Accepted Answer · answered Mar 03 '21 at 02:09

2

I am pretty sure you will get what you want by flattening the multi index and taking numpy array from the result. E.g. by using the following syntax

np.array(list(df.index))

answered Mar 03 '21 at 02:09

delimiter

This works, too. I was wondering if this is better( faster, more applicable to all situation, etc.) or the one I found. – Ali_MM Mar 04 '21 at 19:09
That would be a point for you to measure the performance, there are ways of doing it, but likely it won't be very noticeable if your dataset is not sizeable enough. In the meantime, don't hesitate to accept the response to your liking. – delimiter Mar 04 '21 at 19:40

score 2 · Answer 2 · answered Mar 03 '21 at 02:23

2

turn the index to columns.

df.reset_index()[['A', 'B']].values

answered Mar 03 '21 at 02:23

Ferris

This can work, too. I'm still wondering which method is better/faster/more general. For example, is it possible that in one of the solutions given so far, the dtype of the cells are changed( e.g. from int to float or the other way around). – Ali_MM Mar 04 '21 at 19:17

2 Answers2