4

I have a DataFrame something like this:

data = [['2019-01-01', .1, .2],
        ['2019-01-02', .5, .3],
        ['2019-01-03', .2, .4]]
df = pd.DataFrame(data, columns=['date', 'hour01', 'hour02'])
         date  hour01  hour02
0  2019-01-01     0.1     0.2
1  2019-01-02     0.5     0.3
2  2019-01-03     0.2     0.4

How to melt it so that I get proper 15 minutes intervals? Like this:

    timestamp              value
0  2019-01-01 00:00:00     0.1
1  2019-01-01 00:15:00     0.1
2  2019-01-01 00:30:00     0.1
3  2019-01-01 00:45:00     0.1
4  2019-01-01 01:00:00     0.2
5  2019-01-01 01:15:00     0.2
6  2019-01-01 01:30:00     0.2
7  2019-01-01 01:45:00     0.2
...
16 2019-01-03 00:00:00     0.2
17 2019-01-03 00:15:00     0.2
18 2019-01-03 00:30:00     0.2
19 2019-01-03 00:45:00     0.2
20 2019-01-03 01:00:00     0.4
21 2019-01-03 01:15:00     0.4
22 2019-01-03 01:30:00     0.4
23 2019-01-03 01:45:00     0.4

Edit

df.melt(id_vars=['timestamp'], value_vars=['hour_{}'.format(str(x).zfill(2)) for x in range(1, 24)])

gives me this:

27    2017-01-28  hour_01  34.90
28    2017-01-29  hour_01  36.04
29    2017-01-30  hour_01  36.51
          ...      ...    ...
16760 2018-12-02  hour_23  51.50
16761 2018-12-03  hour_23  54.00
16762 2018-12-04  hour_23  53.87

Where to go from here?

miller
  • 1,636
  • 3
  • 26
  • 55

2 Answers2

5

Maybe you can do it starting with melt too, but unless using melt is by some reason a requirement, you can obtain it in this way:

  1. Make 'date' a datetime column, if not already.
  2. Using groupby and apply you can generate the timestamps for all the time intervals using pandas date_range and spanning the hourly values using numpy repeat.
  3. Finally reset the index.

Translated in a code is:

df['date'] = pd.to_datetime(df['date'])

ddf = df.groupby('date').apply(lambda row : pd.DataFrame(
      {'timestamp' : pd.date_range(row['date'].iloc[0], periods=4*len(df.columns[1:]), freq='15T'),
       'value' : np.repeat(np.array([row[col].iloc[0] for col in df.columns[1:]]), 4)}))
ddf.reset_index(inplace=True, drop=True)

Using your starting dataframe, ddf is:

             timestamp  value
0  2019-01-01 00:00:00    0.1
1  2019-01-01 00:15:00    0.1
2  2019-01-01 00:30:00    0.1
3  2019-01-01 00:45:00    0.1
4  2019-01-01 01:00:00    0.2
5  2019-01-01 01:15:00    0.2
6  2019-01-01 01:30:00    0.2
7  2019-01-01 01:45:00    0.2
8  2019-01-02 00:00:00    0.5
9  2019-01-02 00:15:00    0.5
10 2019-01-02 00:30:00    0.5
11 2019-01-02 00:45:00    0.5
12 2019-01-02 01:00:00    0.3
13 2019-01-02 01:15:00    0.3
14 2019-01-02 01:30:00    0.3
15 2019-01-02 01:45:00    0.3
16 2019-01-03 00:00:00    0.2
17 2019-01-03 00:15:00    0.2
18 2019-01-03 00:30:00    0.2
19 2019-01-03 00:45:00    0.2
20 2019-01-03 01:00:00    0.4
21 2019-01-03 01:15:00    0.4
22 2019-01-03 01:30:00    0.4
23 2019-01-03 01:45:00    0.4

This code will automatically pick how many columns you have after 'date', assuming that they are all 'hour' columns. If you have other columns mixed in the dataframe, they should be filtered out from df.columns[1:].

Valentino
  • 7,291
  • 6
  • 18
  • 34
3

Solution based on melt, set_index and ffill:

df = df.melt(id_vars=['date'], var_name='hour')
df['timestamp'] = pd.to_datetime(df['date']) + pd.to_timedelta(df['hour'].str[4:].astype(int) - 1, unit='h')
df = df.set_index(pd.DatetimeIndex(df['timestamp']))
df = df.drop(columns=['timestamp', 'date', 'hour'])
df = df.resample('15T').ffill()
df = df.reset_index()

Results:

              timestamp  value
0   2019-01-01 00:00:00    0.1
1   2019-01-01 00:15:00    0.1
2   2019-01-01 00:30:00    0.1
3   2019-01-01 00:45:00    0.1
4   2019-01-01 01:00:00    0.2
..                  ...    ...
192 2019-01-03 00:00:00    0.2
193 2019-01-03 00:15:00    0.2
194 2019-01-03 00:30:00    0.2
195 2019-01-03 00:45:00    0.2
196 2019-01-03 01:00:00    0.4
Quant Christo
  • 1,275
  • 9
  • 23