Python Dataframe to Columnar format for accessing the columns dynamically

Question

Dataframe data will be :

COL1	COL2	COL3
100	200	300
101	201	301
102	202	302
103	203	303

Expected output: Format as HBase (Columnar fashion)

Consider COL1 as KEY_COLUMN vaue

KEY_COLUMN	KEY	VALUE
100	COL2	200
100	COL3	300
101	COL2	201
101	COL3	301
102	COL2	202
102	COL3	302
103	COL2	203
103	COL3	303

I tried using .loc which transpose/convert all key to value like below.

for idx in df.index:
     print (df.loc[idx])

COL1  100
COL2  200
COL3  300
COL1  101
COL2  201
COL3  301
COL1  102
COL2  202
COL3  302
COL1  103
COL2  203
COL3  303

But I couldn't bring the KEY_COLUMN as column 1 and KEY and VALUE as subsequent columns. Could anyone please suggest. Thanks!

score 1 · Accepted Answer · answered Jul 13 '21 at 17:17

1

Try pd.melt:

df = (
    df.rename(columns={"COL1": "KEY_COLUMN"})
    .melt("KEY_COLUMN", var_name="KEY", value_name="VALUE")
    .sort_values(by="KEY_COLUMN")
)
print(df)

Prints:

   KEY_COLUMN   KEY  VALUE
0         100  COL2    200
4         100  COL3    300
1         101  COL2    201
5         101  COL3    301
2         102  COL2    202
6         102  COL3    302
3         103  COL2    203
7         103  COL3    303

answered Jul 13 '21 at 17:17

Andrej Kesely

168,389
15
48
91

The above one really works... Also it deals with the columns dynamically... Shall we bring more than 1 column into Key_column ?? The above data frame has COL1 as KEY_COLUMN, how to bring two KEY_COLUMNS as COL1 & COL2 ? – Priyan Jul 14 '21 at 04:36

Python Dataframe to Columnar format for accessing the columns dynamically

1 Answers1