2

I have a multiindex df s:

arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
    ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
pd.MultiIndex(levels=[['bar', 'baz', 'foo', 'qux'], ['one', 'two']],
       labels=[[0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 0, 1, 0, 1, 0, 1]],
       names=['first', 'second'])
s = pd.Series(np.random.randn(8), index=index)
s

I want to add a new index column "zero" with x,y,z to s by matching index columns "first" and "second". In other words, I want to repeat s three times, but with this additional index column with x,y,z. I tried the reindex (see below), but why it gives me all NaNs?

mux=pd.MultiIndex.from_product([["x","y","z"], 
                            s.index.get_level_values(0),
                            s.index.get_level_values(1)],
                           names=["zero","first", "second"])
t=s.reindex(mux)
t

I also tried specifying matching levels to be "first" and "second", but it looks like level only takes one integer?

xiaoshir
  • 215
  • 4
  • 17

2 Answers2

2

You can use reindex, but is necessary create MultiIndex by levels. But it append new level to existing, so if necessary add reorder_levels and sort_index:

np.random.seed(123)
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
    ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])

s = pd.Series(np.random.randn(8), index=index)
#print (s)

mux=pd.MultiIndex.from_product([s.index.levels[0],s.index.levels[1], ["x","y","z"]])
t=s.reindex(mux, method='ffill').reorder_levels([2,0,1]).sort_index()
print (t)
x  bar  one   -1.085631
        two    0.997345
   baz  one    0.282978
        two   -1.506295
   foo  one   -0.578600
        two    1.651437
   qux  one   -2.426679
        two   -0.428913
y  bar  one   -1.085631
        two    0.997345
   baz  one    0.282978
        two   -1.506295
   foo  one   -0.578600
        two    1.651437
   qux  one   -2.426679
        two   -0.428913
z  bar  one   -1.085631
        two    0.997345
   baz  one    0.282978
        two   -1.506295
   foo  one   -0.578600
        two    1.651437
   qux  one   -2.426679
        two   -0.428913
dtype: float64
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

IIUC, you want pd.concat?

s = pd.concat([s] * 3, axis=0, keys=['x', 'y', 'z'])

If needed, rename the axis:

s = s.rename_axis(['zero', 'first', 'second'])

s 

zero  first  second
x     bar    one       0.510567
             two       0.066620
      baz    one       0.667948
             two      -1.471894
      foo    one       1.881198
             two       0.143628
      qux    one       1.108174
             two      -0.978112
y     bar    one       0.510567
             two       0.066620
      baz    one       0.667948
             two      -1.471894
      foo    one       1.881198
             two       0.143628
      qux    one       1.108174
             two      -0.978112
z     bar    one       0.510567
             two       0.066620
      baz    one       0.667948
             two      -1.471894
      foo    one       1.881198
             two       0.143628
      qux    one       1.108174
             two      -0.978112
dtype: float64
cs95
  • 379,657
  • 97
  • 704
  • 746
  • Yes, this is definitely an easier way to achieve what I wanted. Thanks. – xiaoshir Mar 29 '18 at 09:17
  • @edge27 No problem, jezrael's answer is also good, so you can upvote all answers even if you can only accept one. Cheers. – cs95 Mar 29 '18 at 09:41