1

I have table with factor and time interval. What I want to do is to get long table with each date in interval between START_DATE and END_DATE.

dt_in = pd.DataFrame({'factor':['A','B'],
          'START_DATE':[pd.Timestamp('2015-01-01'),pd.Timestamp('2016-02-05')],
          'END_DATE':[pd.Timestamp('2015-01-04'),pd.Timestamp('2016-02-07')]})

    END_DATE    START_DATE  factor
0   2015-01-04  2015-01-01  A
1   2016-02-07  2016-02-05  B

I want to have output table like this one:

dt_out = pd.DataFrame({'factor': ['A','A','A','A','B','B','B'],
                   'DATE': ['2015-01-01', '2015-01-02', '2015-01-03', '2015-01-04',
                            '2016-02-05', '2016-02-06', '2016-02-07']})
    DATE    factor
0   2015-01-01  A
1   2015-01-02  A
2   2015-01-03  A
3   2015-01-04  A
4   2016-02-05  B
5   2016-02-06  B
6   2016-02-07  B

How can I do this?

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
BiXiC
  • 933
  • 3
  • 9
  • 29

1 Answers1

2

You can use melt for reshaping and then groupby with resample for filling dates:

df = pd.melt(dt_in, id_vars='factor', value_name='DATE')
       .set_index('DATE')
       .drop('variable',axis=1)

print (df)
           factor
DATE             
2015-01-04      A
2016-02-07      B
2015-01-01      A
2016-02-05      B

print (df.groupby('factor')
         .resample('1D')
         .ffill()
         .reset_index(drop=True, level=0)
         .reset_index())

        DATE factor
0 2015-01-01      A
1 2015-01-02      A
2 2015-01-03      A
3 2015-01-04      A
4 2016-02-05      B
5 2016-02-06      B
6 2016-02-07      B

Notice:

This funcionality is new in pandas 0.18.1.

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252