How can I transform a DataFrame using pivot and melt in Python Pandas?

Question

I have a DataFrame

         date     price num_floors    house
1  2023-01-01  94.30076          3        A
2  2023-01-01  95.58771          2        B
3  2023-01-02 102.78559          1        C
4  2023-01-03  93.29053          3        D

and I want to change it, so that each column contains the prices and num_floors for all houses for a given date. For one column, the first two rows of a column refer to the first house, the next two to the second house. The remaining entries without data are filled with the missing value NaN, like this:

  2023-01-01    2023-01-02  2023-01-03
1   94.30076     102.78559    93.29053
2          3             1           3         
3   95.58771            NA          NA
4          2            NA          NA

I succeed using R:

df_trans <- df %>%
    pivot_longer(-date) %>%
    mutate(index=row_number(), .by = date) %>%
    pivot_wider(id_cols = index, names_from = date, values_from = value) %>%
    select(-index)

but when I try with python, it does not work as I want:

df_trans = df.melt(id_vars='date')
df_trans['n'] = df_trans.groupby('date').cumcount() + 1
df_trans = df_trans.pivot(index='n', columns='date', values='value')

Aside: your desired result suggests an unclear data model or report. You lose all meaning of data with no index identifier. Plus, there are four houses in original data not two as mentioned in text. With so many whys to ask, this feels like an [XY Problem](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem). — Parfait, May 28 '23 at 22:18

score 1 · Answer 1 · answered May 28 '23 at 22:11

Try:

df = df.drop(columns='house')
df['tmp'] = df.groupby('date').cumcount()
df = df.set_index(['date', 'tmp']).stack().unstack('date').reset_index(drop=True)
df.columns.name = None

print(df)

Prints:

   2023-01-01  2023-01-02  2023-01-03
0    94.30076   102.78559    93.29053
1     3.00000     1.00000     3.00000
2    95.58771         NaN         NaN
3     2.00000         NaN         NaN

score 0 · Answer 2 · answered May 28 '23 at 22:11

pd.concat([j.set_index("date")[["price", "num_floors"]].T \
    for i, j in df.groupby(df.groupby("date").cumcount())])

In sections parts:

Group by date and cumulative count, and group by this
For each group, manipulate to desired output
Concatenate groups

PaulS · Answer 3 · 2023-05-29T17:17:19.617

Another possible solution:

(pd.DataFrame(np.vstack([np.vstack([[x, y], [x, z]]) for x, y, z in
    zip(df['date'], df['price'], df['num_floors'])]))
 .pivot(columns=0, values=1).rename_axis(None, axis=1)
 .apply(lambda x: x.dropna(ignore_index=True)))

Alternatively,

(df.assign(
    price = [[x,y] for x,y in zip(df['price'], df['num_floors'])])
 .pivot(columns='date', values='price')
 .apply(lambda x: x.explode(ignore_index=True))
 .rename_axis(None, axis=1)
 .apply(lambda x: x.dropna(ignore_index=True)))

Output:

  2023-01-01 2023-01-02 2023-01-03
0   94.30076  102.78559   93.29053
1          3          1          3
2   95.58771        NaN        NaN
3          2        NaN        NaN

How can I transform a DataFrame using pivot and melt in Python Pandas?

3 Answers3