pair wise melt in pandas dataframe

Question

I want to pair wise create a new dataframe. I tried using MELT but it did not really work. If you notice columns are pair wise ( i.e code:type , code1:type1 ) I tried creating a list of code columns and list of type columns and then do melt.

How would you suggest this issue? I would appreciate if some explanation is given.

ORIGINAL DATAFRAME:

id   Name  code code1 code2  type type1 type2
 0    ABC    1      4     8   S     E      T
 1    XYZ    2            5   R            U

EXPECTED DATAFRAME

id name  Code_All Type_All
0   ABC      1      S
0   ABC      4      E
0   ABC      8      T
1   XYZ      2      R
1   XYZ               
1   XYZ      5      U

could you please guide in the case of 'paired' columns. can i do in a single line code or should it be in a loop? — Sai Astro, Feb 27 '20 at 19:20
does column `code 2` has a space as shown? or is it a typo and should be `code2` ? — anky, Feb 27 '20 at 19:37
hi @anky_91 yes it was a typo. Your response would be appericiated. — Sai Astro, Feb 28 '20 at 06:01
@anky_91 I tried df.set_index(['ID','Name']).stack().reset_index() which gives me but I need to have single column for code columns and separate column for type columns. — Sai Astro, Feb 28 '20 at 09:22

score 0 · Answer 1 · answered Feb 28 '20 at 09:38

0

Here problem is we don't have a clear suffix (first entry for every group is without a suffix) for pd.wide_to_long , hence modified the columns first and then applied pd.wide_to_long:

m  = df.set_index(['id','Name'])
cols = m.columns.str.replace('\d+','').to_series()
m.columns=cols.add(cols.groupby(cols).cumcount().astype(str))

out = (pd.wide_to_long(m.reset_index(),['code','type'],['id','Name'],'idx')
                                              .reset_index(['id','Name']))

     id Name  code type
idx                    
0     0  ABC   1.0    S
1     0  ABC   4.0    E
2     0  ABC   8.0    T
0     1  XYZ   2.0    R
1     1  XYZ   NaN  NaN
2     1  XYZ   5.0    U

Code to reproduce the dataframe:

d = {'id': {0: 0, 1: 1},
  'Name': {0: 'ABC', 1: 'XYZ'},
  'code': {0: 1, 1: 2},
  'code1': {0: 4.0, 1: np.nan},
  'code2': {0: 8, 1: 5},
  'type': {0: 'S', 1: 'R'},
  'type1': {0: 'E', 1: np.nan},
  'type2': {0: 'T', 1: 'U'}}
df = pd.DataFrame(d)

answered Feb 28 '20 at 09:38

anky

74,114
11
41
70

I am unable to understand the logic about the name of columns and how you are separating it. The original columns are ID, Name, Attributes.ProducerCode.ProducerCode, Attributes.ProducerCode1.ProducerCode, Attributes.ProducerCode2.ProducerCode, Attributes.ProducerPartyRole.ProducerPartyRoleType, Attributes.ProducerPartyRole2.ProducerPartyRoleType, Attributes.ProducerPartyRole3.ProducerPartyRoleType. – Sai Astro Feb 28 '20 at 11:41
Btw, thank you so much for your patience. You are saving my butt at work! ( deadlines argh ! ) – Sai Astro Feb 28 '20 at 11:41
For wide to long to work, we would need the suffixes as numeric in an increasing manner, check how the column of m are before the wide to long line – anky Feb 28 '20 at 12:01
Hi, I tried ! no output. Check my picture, sorry cant take snippet. tinyurl.com/widelongso as you can see I renamed the columns too ! – Sai Astro Feb 28 '20 at 12:42

score 0 · Accepted Answer · answered Mar 04 '20 at 21:21

After few frustrating hours, I was able to complete this using a dirty trick.

I split the dataset in two.

id, name, code1, code2, code3

id, name, type1, type2, type3

I transposed the two dataframes

Example of first dataframe.

id, name, code1

id, name, code2

id, name, code3

Using the pd.concat([df1.df2], axix=1) I was able to concatenate it and removed NULL values and duplicate column names

FINAL DATAFRAME

id, name, code1, type1

id, name, code2, type2

id, name, code3, type3

pair wise melt in pandas dataframe

2 Answers2