2

When converting a pandas.Multiindex to a numpy.ndarray, the output is a one dimensional ndarray with dtype=object as seen in the following example:

df = pd.DataFrame({
    'A': [10, 20, 30, 40, 50, 60],
    'B': [0,1,2,3,4,5],
    'C': ['K0', 'K1', 'K2', 'K3', 'K4', 'K5']
}).set_index(['A','B'])

The df will be:

A B C
10 0 K0
20 1 K1
30 2 K2
40 3 K3
50 4 K4
60 5 K5

The output for df.index.to_numpy() is a one dimensional ndarray with dtype=object:

array([(10, 0), (20, 1), (30, 2), (40, 3), (50, 4), (60, 5)], dtype=object)

but I want:

array([[10,  0],
       [20,  1],
       [30,  2],
       [40,  3],
       [50,  4],
       [60,  5]])

On How to convert a Numpy 2D array with object dtype to a regular 2D array of floats, I found the following solution:

np.vstack(df.index)

Is there any more direct or better solution?

Ali_MM
  • 68
  • 1
  • 9
  • what's the problem with the current solution? – Pablo C Mar 03 '21 at 01:26
  • 2
    What do you mean by better? Isn't `np.vstack(df.index)` precisely the desired output? – fsl Mar 03 '21 at 01:27
  • Yeah, current solution seems fine, but I was wondering if there is any case that my solution won't work or if pandas can give me the correct output without the need to do np.vstack. – Ali_MM Mar 04 '21 at 19:08
  • I was also thinking there can be a downside to my method, compared to, say, @delimiter's solution below (in terms of type conversion or what not), so I thought I can have some people doublecheck it. – Ali_MM Mar 04 '21 at 19:16

2 Answers2

2

I am pretty sure you will get what you want by flattening the multi index and taking numpy array from the result. E.g. by using the following syntax

np.array(list(df.index))
delimiter
  • 745
  • 4
  • 13
  • This works, too. I was wondering if this is better( faster, more applicable to all situation, etc.) or the one I found. – Ali_MM Mar 04 '21 at 19:09
  • That would be a point for you to measure the performance, there are ways of doing it, but likely it won't be very noticeable if your dataset is not sizeable enough. In the meantime, don't hesitate to accept the response to your liking. – delimiter Mar 04 '21 at 19:40
2

turn the index to columns.

df.reset_index()[['A', 'B']].values
Ferris
  • 5,325
  • 1
  • 14
  • 23
  • This can work, too. I'm still wondering which method is better/faster/more general. For example, is it possible that in one of the solutions given so far, the dtype of the cells are changed( e.g. from int to float or the other way around). – Ali_MM Mar 04 '21 at 19:17