2

I have a dataframe pivot table with 2 level of index: month and rating. The rating should be 1,2,3 (not to be confused with the columns 1,2,3). I found that for some months, the rating could be missing. E.g, (Population and 2021-10) only has rating 1,2. I need every month to have ratings 1,2,3. So I need to fill 0 for the missing rating index.

        tbl = pd.pivot_table(self.df, values=['ID'], index=['month', 'risk'],
                             columns=["Factor"], aggfunc='count', fill_value=0)
        tbl = tbl.droplevel(None, axis=1).rename_axis(None, axis=1).rename_axis(index={'month': None,
                                                                                       'Risk': 'Client Risk Rating'})

# show Low for rating 1, Moderate for rating 2, Potential High for rating 3
    rating = {1: 'Low',
              2: 'Moderate',
              3: 'Potential High'
              }
    pop = {'N': 'Refreshed Clients', 'Y': 'Population'}
        tbl.rename(index={**rating,**pop}, inplace=True)
        tbl = tbl.applymap(lambda x: x.replace(',', '')).astype(np.int64)
        tbl = tbl.div(tbl.sum(axis=1), axis=0)
        # client risk rating may be missing (e.g., only 1,2). 
        # To draw, need to fill the missing client risk rating with 0 
        print("before",tbl)
        tbl=tbl.reindex(pd.MultiIndex.from_product(tbl.index.levels), fill_value=0)
        print("after pd.MultiIndex.from_product",tbl)

I have used pd.MultiIndex.from_product. It does not work when all data is missing one index. For example, population has Moderate, 2021-03 and 2021-04 have Low and Moderate. After pd.MultiIndex.from_product, population has Low and Moderate, but all are missing High. My question is to have every month with risk 1,2,3. It seems the index values are from data. enter image description here

enter image description here

enter image description here

Summer
  • 241
  • 2
  • 10
  • 23
  • 2
    Don't include your data as picture. Do include code to generate your data, especially for multiIndex data, e.g. `df.to_dict()`. – Quang Hoang Jul 02 '22 at 06:07

1 Answers1

3

You can use pd.MultiIndex.from_product to create a full index:

>>> df
                             1         2         3
(Population)       1  0.436954  0.897747  0.387058
                   2  0.464940  0.611953  0.133941
2021-08(Refreshed) 1  0.496111  0.282798  0.048384
                   2  0.163582  0.213310  0.504647
                   3  0.008980  0.651175  0.400103

>>> df.reindex(pd.MultiIndex.from_product(df.index.levels), fill_value=0)
                             1         2         3
(Population)       1  0.436954  0.897747  0.387058
                   2  0.464940  0.611953  0.133941
                   3  0.000000  0.000000  0.000000  # New record
2021-08(Refreshed) 1  0.496111  0.282798  0.048384
                   2  0.163582  0.213310  0.504647
                   3  0.008980  0.651175  0.400103

Update

I wonder df=df.reindex([1,2,3],level='rating',fill_value=0) doesn't work because the new index values [1,2,3] cannot fill the missing values for the previous rating index. By using the from_product, it creates the product of two index.

In fact it works. I mean it has an effect but not the one you expect. The method reindex the level not the values. Let me show you:

# It seems there is not effect because you don't see 3 and 4 as expected?
>>> df.reindex([1, 2, 3, 4], level='ratings')
                                   0         1         2
                   ratings                              
(Population)       1        0.536154  0.671380  0.839362
                   2        0.729484  0.512379  0.440018
2021-08(Refreshed) 1        0.279990  0.295757  0.405536
                   2        0.864217  0.798092  0.144219
                   3        0.214566  0.407581  0.736905

# But yes something happens
>>> df.reindex([1, 2, 3, 4], level='ratings').index.levels
FrozenList([['(Population)', '2021-08(Refreshed)'], [1, 2, 3, 4]])
                              The level has been reindexed ---^

# It's different from values
>>> df.reindex([1, 2, 3, 4], level='ratings').index.get_level_values('ratings')
Int64Index([1, 2, 1, 2, 3], dtype='int64', name='ratings')
Corralien
  • 109,409
  • 8
  • 28
  • 52
  • Worked! Thank you so much. I wonder df=df.reindex([1,2,3],level='rating',fill_value=0) doesn't work because the new index values [1,2,3] cannot fill the missing values for the previous rating index. By using the from_product, it creates the product of two index. – Summer Jul 02 '22 at 18:08
  • @Summer. I updated my answer with an explanation. I hope it's clearer now? – Corralien Jul 02 '22 at 19:50
  • Thank you!!! I've used pd.MultiIndex.from_product to create a full index. It worked for different cases. But one case still missed one index value. In your code example, that is population 3 0.000000 0.000000 0.000000 # New record. Do you know the possible reason for this? – Summer Jul 22 '22 at 05:23
  • If you use `pd.MultiIndex.from_product` as suggested, it should be not possible to miss a combination. Do you have an example? – Corralien Jul 22 '22 at 05:33
  • Hi Corralien, I've edited my question with code and example. Basically speaking, when all data is missing rate 3, even after ```python pd.MultiIndex.from_product```, all data is missing rate 3. – Summer Jul 22 '22 at 15:31
  • As you already know the values for the second level, you can use `df.reindex(pd.MultiIndex.from_product([df.index.levels[0], rating.values()]), fill_value=0)` – Corralien Jul 22 '22 at 19:30