10

I have a data frame containing a number of observations:

date         colour     orders
2014-10-20   red        7
2014-10-21   red        10
2014-10-20   yellow     3

I would like to re-index the data frame and standardise the dates.

date         colour     orders
2014-10-20   red        7
2014-10-21   red        10
2014-10-22   red        NaN
2014-10-20   yellow     3
2014-10-21   yellow     NaN
2014-10-22   yellow     NaN

I though to order the data frame by colour and date, and then try to re-index it.

index = pd.date_range('20/10/2014', '22/10/2014')
test_df = df.sort(['colour', 'date'], ascending=(True, True))
ts = test_df.reindex(index)
ts

But it returns a new data frame with the right index but all NaN values.

date         colour     orders
2014-10-20   NaN        NaN
2014-10-21   NaN        NaN
2014-10-22   NaN        NaN
Gianluca
  • 6,307
  • 19
  • 44
  • 65
  • What is `index` in your example? – joris Dec 11 '14 at 11:05
  • Hi Joris, I'm new to pandas. I think the initial data frame is actually not indexed at all. I've sorted it but didn't set any index. – Gianluca Dec 11 '14 at 11:09
  • But I mean, you use a variable called `index` in the line `ts = test_df.reindex(index)`. So what is that exactly? – joris Dec 11 '14 at 11:11
  • Sorry, I've edited the initial question with the line of code missing. Ideally I would let pandas find the start and end date automatically.. like the smaller and the bigger date in the data frame. I just saw the command `test_df.resample('D')` is meant to do it, but I think I should index beforehand the test_df by 'date', which I'm struggling with. – Gianluca Dec 11 '14 at 11:16

1 Answers1

18

Starting from your exampe dataframe:

In [51]: df
Out[51]:
        date  colour  orders
0 2014-10-20     red       7
1 2014-10-21     red      10
2 2014-10-20  yellow       3

If you want to reindex on both 'date' and 'colour', one possibility is to set both as the index (a multi-index):

In [52]: df = df.set_index(['date', 'colour'])

In [53]: df
Out[53]:
                   orders
date       colour
2014-10-20 red          7
2014-10-21 red         10
2014-10-20 yellow       3

You can now reindex this dataframe, after you constructed to desired index:

In [54]: index = pd.date_range('20/10/2014', '22/10/2014')

In [55]: multi_index = pd.MultiIndex.from_product([index, ['red', 'yellow']])

In [56]: df.reindex(multi_index)
Out[56]:
                   orders
2014-10-20 red          7
           yellow       3
2014-10-21 red         10
           yellow     NaN
2014-10-22 red        NaN
           yellow     NaN

To have the same output as your example output, the index should be sorted in the second level (level=1 as it is 0-based):

In [60]: df2 = df.reindex(multi_index)

In [64]: df2.sortlevel(level=1)
Out[64]:
                   orders
2014-10-20 red          7
2014-10-21 red         10
2014-10-22 red        NaN
2014-10-20 yellow       3
2014-10-21 yellow     NaN
2014-10-22 yellow     NaN

A possible way to generate the multi-index automatically would be (with your original frame):

pd.MultiIndex.from_product([pd.date_range(df['date'].min(), df['date'].max(), freq='D'), 
                            df['colour'].unique()])

Another way would be to use resample for each group of colors:

In [77]: df = df.set_index('date')

In [78]: df.groupby('colour').resample('D')

This is simpler, but this does not give you the full range of dates for each colour, only the range of dates that is available for that colour group.

joris
  • 133,120
  • 36
  • 247
  • 202
  • Let's say I have thousands of products (to be fair I also don't have just one column, but one for the category, a various sub-categories, etc..), how can I change this part of the code `multi_index = pd.MultiIndex.from_product([index, ['red', 'yellow']])`? – Gianluca Dec 11 '14 at 11:46
  • See my "A possible way to generate the multi-index automatically ..." to do this when there are a lot of values in the `colour` column – joris Dec 11 '14 at 11:47
  • @Gianluca Did this solve your problem? Or are there still problems? – joris Dec 12 '14 at 14:26