2

I have a table like this

device_type version pool testMean testP50 testP90 testP99 testStd WidgetMean WidgetP50 WidgetP90 WidgetP99 WidgetStd
PNB0Q7 8108162 123 124 136 140.8 141.88 21.35 2.2 0 6.4 9.64 3.92

I want it to be transformed like this:

device_type version pool Name Mean P50 P90 P99 Std
PNB0Q7 8108162 123 test 123 136 140.8 142.88 21.35
PNB0Q7 8108162 123 Widget 2.2 0 6.4 9.64 3.92

I tried using melt but getting:

df.melt(id_vars=["device_type", "version", "pool"], var_name="Name", value_name="Value")
device_type version pool Name Value
PNB0Q7 8108162 test testMean 124.00
PNB0Q7 8108162 test testP50 136.00
PNB0Q7 8108162 test testP90 140.80
PNB0Q7 8108162 test testP99 141.88
PNB0Q7 8108162 test testStd 21.35

Any idea on how to reach to expected solution

Anonymous
  • 835
  • 1
  • 5
  • 21
Aditya
  • 818
  • 1
  • 10
  • 21

3 Answers3

3

You can do this with pd.wide_to_long and a little column naming cleanup first, then reshape:

df = df.rename(columns={'Std':'testStd',
                        'TestP90':'testP90',
                        'TestP99':'testP99', 
                        'TestP50':'testP50'})
df_out = pd.wide_to_long(df, 
                         ['test','Widget'], 
                         ['device_type', 'version', 'pool'], 
                         'Measure', '', '.+' )
df_out = df_out.unstack(-1).stack(0).reset_index()
df_out

Output:

Measure device_type  version  pool level_3   Mean    P50    P90     P99    Std
0            PNB0Q7  8108162   123  Widget    2.2    0.0    6.4    9.64   3.92
1            PNB0Q7  8108162   123    test  124.0  136.0  140.8  141.88  21.35

Update renaming 'level_3' above:

df = df.rename(columns={'Std':'testStd',
                        'TestP90':'testP90',
                        'TestP99':'testP99', 
                        'TestP50':'testP50'})
df_out = pd.wide_to_long(df, 
                         ['test','Widget'], 
                         ['device_type', 'version', 'pool'], 
                         'Measure', '', '.+' )\
            .rename_axis('Instrument', axis=1) #add this line to rename column header axis
df_out = df_out.unstack(-1).stack(0).reset_index()
df_out

Output:

Measure device_type  version  pool Instrument   Mean    P50    P90     P99    Std
0            PNB0Q7  8108162   123     Widget    2.2    0.0    6.4    9.64   3.92
1            PNB0Q7  8108162   123       test  124.0  136.0  140.8  141.88  21.35
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • `pd.wide_to_long` is good for situation where you need to do simultaneous melting on columns. – Scott Boston May 12 '22 at 23:28
  • 1
    I'd suggest flipping the columns, so test/widget is at the end, and then run wide_to_long, that way you do not need the stack/unstack combination, which only makes the process more expensive (current steps does a wide to long to wide to long, three steps) – sammywemmy May 13 '22 at 01:30
  • I am getting exception with large data File "/Users/jaiadi/Desktop/PostProcessAnalysisFile.py", line 29, in df_out = df_out.unstack(-1).stack(0).reset_index() return _Unstacker( File "/usr/local/lib/python3.9/site-packages/pandas/core/reshape/reshape.py", line 120, in __init__ self._make_selectors() File "/usr/local/lib/python3.9/site-packages/pandas/core/reshape/reshape.py", line 169, in _make_selectors raise ValueError("Index contains duplicate entries, cannot reshape") ValueError: Index contains duplicate entries, cannot reshape – Aditya May 13 '22 at 02:17
  • With duplicate entries you are going to have to uses groupby and aggregate like values with sum, mean, first or some sort of aggregation function. – Scott Boston May 13 '22 at 02:55
  • Can we rename the column name also e.g. level_3 in above example – Aditya May 13 '22 at 06:03
  • @Aditya See update. – Scott Boston May 13 '22 at 12:46
1
df.columns = ['device_type', 'version', 'pool', 'Mean', 'P50', 'P90', 'P99', 'Std']
df['Name'] = 'test'
df = df[['device_type', 'version', 'pool', 'Name', 'Mean', 'P50', 'P90', 'P99', 'Std']]
print(df)

Output:

  device_type  version  pool  Name  Mean  P50    P90     P99    Std
0      PNB0Q7  8108162   123  test   124  136  140.8  141.88  21.35
BeRT2me
  • 12,699
  • 2
  • 13
  • 31
  • Is there a way if i have multiple entry? How to put multiple names? device_type version pool testMean TestP50 TestP90 TestP99 Std WidgetMean WidgetP50 WidgetP90 WidgetP99 WidgetStd PNB0Q7 8108162 123 124 136 140.8 141.88 21.35 2.2 0 6.4 9.64 3.92 – Aditya May 12 '22 at 23:00
  • 1
    Editing the full version of whatever it is you want to do into your question sounds like a good idea. – BeRT2me May 12 '22 at 23:04
1

One option is to transform to long form with pivot_longer from pyjanitor, using the .value placeholder ---> the .value determines which parts of the columns remain as headers. First we need to ensure that Test is lowercase:

# pip install pyjanitor
import pandas as pd
import janitor

df.columns = df.columns.str.replace('Test', 'test')

df

  device_type  version  pool  testMean  testP50  testP90  testP99    Std
0      PNB0Q7  8108162   123       124      136    140.8   141.88  21.35

df.pivot_longer(
     column_names = 'test*', 
     names_to = ('Name', '.value'), 
    names_pattern = r"(test)(.+)"
   )

  device_type  version  pool    Std  Name  Mean  P50    P90     P99
0      PNB0Q7  8108162   123  21.35  test   124  136  140.8  141.88

With the updated data, the same concept applies; however, you need to arrange your columns properly - get the Test as lowercase, change Std to testStd:

df.columns = df.columns.str.replace('Test', 'test')
df = df.rename(columns = {'Std': 'testStd'})
df

  device_type  version  pool  testMean  testP50  testP90  testP99  testStd  WidgetMean  WidgetP50  WidgetP90  WidgetP99  WidgetStd
0      PNB0Q7  8108162   123       124      136    140.8   141.88    21.35         2.2          0        6.4       9.64       3.92

df.pivot_longer(
     column_names = ['test*', 'Widget*'], 
     names_to = ('Name', '.value'), 
     names_pattern = r"(test|Widget)(.+)"
    )

  device_type  version  pool    Name   Mean  P50    P90     P99    Std
0      PNB0Q7  8108162   123    test  124.0  136  140.8  141.88  21.35
1      PNB0Q7  8108162   123  Widget    2.2    0    6.4    9.64   3.92
sammywemmy
  • 27,093
  • 4
  • 17
  • 31
  • Is there a way if i have multiple entry? How to put multiple column names? e.g. device_type version pool testMean TestP50 TestP90 TestP99 Std WidgetMean WidgetP50 WidgetP90 WidgetP99 WidgetStd PNB0Q7 8108162 123 124 136 140.8 141.88 21.35 2.2 0 6.4 9.64 3.92 – Aditya May 12 '22 at 23:02