1

I handle a DataFrame, which index is string, year-month, for example:

index = ['2007-01', '2007-03', ...]

however, the index is not full. e.g. 2007-02 is missing. What I want is to reindex the DataFrame with full index.

What I have tried:

In [60]: pd.DatetimeIndex(start='2007-01', end='2007-12', freq='M')
Out[60]: 
DatetimeIndex(['2007-01-31', '2007-02-28', '2007-03-31', '2007-04-30',
           '2007-05-31', '2007-06-30', '2007-07-31', '2007-08-31',
           '2007-09-30', '2007-10-31', '2007-11-30'],
          dtype='datetime64[ns]', freq='M')

The index is every month's ends.

In [64]: pd.DatetimeIndex(['2007-01', '2007-03', '2007-04', '2007-05'])
Out[64]: DatetimeIndex(['2007-01-01', '2007-03-01', '2007-04-01', '2007-05-01'],    dtype='datetime64[ns]', freq=None)

The index is every month's start.

How to handle this problem?

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
PhilChang
  • 2,591
  • 1
  • 16
  • 18
  • The `'M'` frequency is the month end for the beginning use `'MS'` for start of month see the [docs](http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases) – EdChum Sep 27 '16 at 09:14

1 Answers1

2

I think you need add parameter freq='MS' if need frequency first day of months:

print (pd.DatetimeIndex(start='2007-01', end='2007-12', freq='MS'))
DatetimeIndex(['2007-01-01', '2007-02-01', '2007-03-01', '2007-04-01',
               '2007-05-01', '2007-06-01', '2007-07-01', '2007-08-01',
               '2007-09-01', '2007-10-01', '2007-11-01', '2007-12-01'],
              dtype='datetime64[ns]', freq='MS')

Link to Offset Aliases in pandas documentation, thank you EdChum.

Another solution is use PeriodIndex for generating months period:

print (pd.PeriodIndex(start='2007-01', end='2007-12', freq='M'))
PeriodIndex(['2007-01', '2007-02', '2007-03', '2007-04', '2007-05', '2007-06',
             '2007-07', '2007-08', '2007-09', '2007-10', '2007-11', '2007-12'],
            dtype='int64', freq='M')
Community
  • 1
  • 1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252