How do I melt a pandas with custom nam

Question

I have a table like this

device_type	version	pool	testMean	testP50	testP90	testP99	testStd	WidgetMean	WidgetP50	WidgetP90	WidgetP99	WidgetStd
PNB0Q7	8108162	123	124	136	140.8	141.88	21.35	2.2	0	6.4	9.64	3.92

I want it to be transformed like this:

device_type	version	pool	Name	Mean	P50	P90	P99	Std
PNB0Q7	8108162	123	test	123	136	140.8	142.88	21.35
PNB0Q7	8108162	123	Widget	2.2	0	6.4	9.64	3.92

I tried using melt but getting:

df.melt(id_vars=["device_type", "version", "pool"], var_name="Name", value_name="Value")

device_type	version	pool	Name	Value
PNB0Q7	8108162	test	testMean	124.00
PNB0Q7	8108162	test	testP50	136.00
PNB0Q7	8108162	test	testP90	140.80
PNB0Q7	8108162	test	testP99	141.88
PNB0Q7	8108162	test	testStd	21.35

Any idea on how to reach to expected solution

Scott Boston · Accepted Answer · 2022-05-13T12:46:04.933

3

You can do this with pd.wide_to_long and a little column naming cleanup first, then reshape:

df = df.rename(columns={'Std':'testStd',
                        'TestP90':'testP90',
                        'TestP99':'testP99', 
                        'TestP50':'testP50'})
df_out = pd.wide_to_long(df, 
                         ['test','Widget'], 
                         ['device_type', 'version', 'pool'], 
                         'Measure', '', '.+' )
df_out = df_out.unstack(-1).stack(0).reset_index()
df_out

Output:

Measure device_type  version  pool level_3   Mean    P50    P90     P99    Std
0            PNB0Q7  8108162   123  Widget    2.2    0.0    6.4    9.64   3.92
1            PNB0Q7  8108162   123    test  124.0  136.0  140.8  141.88  21.35

Update renaming 'level_3' above:

df = df.rename(columns={'Std':'testStd',
                        'TestP90':'testP90',
                        'TestP99':'testP99', 
                        'TestP50':'testP50'})
df_out = pd.wide_to_long(df, 
                         ['test','Widget'], 
                         ['device_type', 'version', 'pool'], 
                         'Measure', '', '.+' )\
            .rename_axis('Instrument', axis=1) #add this line to rename column header axis
df_out = df_out.unstack(-1).stack(0).reset_index()
df_out

Output:

Measure device_type  version  pool Instrument   Mean    P50    P90     P99    Std
0            PNB0Q7  8108162   123     Widget    2.2    0.0    6.4    9.64   3.92
1            PNB0Q7  8108162   123       test  124.0  136.0  140.8  141.88  21.35

edited May 13 '22 at 12:46

answered May 12 '22 at 23:25

Scott Boston

147,308
15
139
187

`pd.wide_to_long` is good for situation where you need to do simultaneous melting on columns. – Scott Boston May 12 '22 at 23:28
1

I'd suggest flipping the columns, so test/widget is at the end, and then run wide_to_long, that way you do not need the stack/unstack combination, which only makes the process more expensive (current steps does a wide to long to wide to long, three steps) – sammywemmy May 13 '22 at 01:30
I am getting exception with large data File "/Users/jaiadi/Desktop/PostProcessAnalysisFile.py", line 29, in df_out = df_out.unstack(-1).stack(0).reset_index() return _Unstacker( File "/usr/local/lib/python3.9/site-packages/pandas/core/reshape/reshape.py", line 120, in __init__ self._make_selectors() File "/usr/local/lib/python3.9/site-packages/pandas/core/reshape/reshape.py", line 169, in _make_selectors raise ValueError("Index contains duplicate entries, cannot reshape") ValueError: Index contains duplicate entries, cannot reshape – Aditya May 13 '22 at 02:17
With duplicate entries you are going to have to uses groupby and aggregate like values with sum, mean, first or some sort of aggregation function. – Scott Boston May 13 '22 at 02:55
Can we rename the column name also e.g. level_3 in above example – Aditya May 13 '22 at 06:03
@Aditya See update. – Scott Boston May 13 '22 at 12:46

score 1 · Answer 2 · answered May 12 '22 at 22:33

1

df.columns = ['device_type', 'version', 'pool', 'Mean', 'P50', 'P90', 'P99', 'Std']
df['Name'] = 'test'
df = df[['device_type', 'version', 'pool', 'Name', 'Mean', 'P50', 'P90', 'P99', 'Std']]
print(df)

Output:

  device_type  version  pool  Name  Mean  P50    P90     P99    Std
0      PNB0Q7  8108162   123  test   124  136  140.8  141.88  21.35

answered May 12 '22 at 22:33

BeRT2me

12,699
2
13
31

Is there a way if i have multiple entry? How to put multiple names? device_type version pool testMean TestP50 TestP90 TestP99 Std WidgetMean WidgetP50 WidgetP90 WidgetP99 WidgetStd PNB0Q7 8108162 123 124 136 140.8 141.88 21.35 2.2 0 6.4 9.64 3.92 – Aditya May 12 '22 at 23:00
1

Editing the full version of whatever it is you want to do into your question sounds like a good idea. – BeRT2me May 12 '22 at 23:04

sammywemmy · Answer 3 · 2022-05-13T01:31:28.593

One option is to transform to long form with pivot_longer from pyjanitor, using the .value placeholder ---> the .value determines which parts of the columns remain as headers. First we need to ensure that Test is lowercase:

# pip install pyjanitor
import pandas as pd
import janitor

df.columns = df.columns.str.replace('Test', 'test')

df

  device_type  version  pool  testMean  testP50  testP90  testP99    Std
0      PNB0Q7  8108162   123       124      136    140.8   141.88  21.35

df.pivot_longer(
     column_names = 'test*', 
     names_to = ('Name', '.value'), 
    names_pattern = r"(test)(.+)"
   )

  device_type  version  pool    Std  Name  Mean  P50    P90     P99
0      PNB0Q7  8108162   123  21.35  test   124  136  140.8  141.88

With the updated data, the same concept applies; however, you need to arrange your columns properly - get the Test as lowercase, change Std to testStd:

df.columns = df.columns.str.replace('Test', 'test')
df = df.rename(columns = {'Std': 'testStd'})
df

  device_type  version  pool  testMean  testP50  testP90  testP99  testStd  WidgetMean  WidgetP50  WidgetP90  WidgetP99  WidgetStd
0      PNB0Q7  8108162   123       124      136    140.8   141.88    21.35         2.2          0        6.4       9.64       3.92

df.pivot_longer(
     column_names = ['test*', 'Widget*'], 
     names_to = ('Name', '.value'), 
     names_pattern = r"(test|Widget)(.+)"
    )

  device_type  version  pool    Name   Mean  P50    P90     P99    Std
0      PNB0Q7  8108162   123    test  124.0  136  140.8  141.88  21.35
1      PNB0Q7  8108162   123  Widget    2.2    0    6.4    9.64   3.92

Is there a way if i have multiple entry? How to put multiple column names? e.g. device_type version pool testMean TestP50 TestP90 TestP99 Std WidgetMean WidgetP50 WidgetP90 WidgetP99 WidgetStd PNB0Q7 8108162 123 124 136 140.8 141.88 21.35 2.2 0 6.4 9.64 3.92 — Aditya, May 12 '22 at 23:02

How do I melt a pandas with custom nam

3 Answers3