1

I want to pair wise create a new dataframe. I tried using MELT but it did not really work. If you notice columns are pair wise ( i.e code:type , code1:type1 ) I tried creating a list of code columns and list of type columns and then do melt.

How would you suggest this issue? I would appreciate if some explanation is given.

ORIGINAL DATAFRAME:

id   Name  code code1 code2  type type1 type2
 0    ABC    1      4     8   S     E      T
 1    XYZ    2            5   R            U

EXPECTED DATAFRAME

id name  Code_All Type_All
0   ABC      1      S
0   ABC      4      E
0   ABC      8      T
1   XYZ      2      R
1   XYZ               
1   XYZ      5      U
Sai Astro
  • 113
  • 1
  • 7

2 Answers2

0

Here problem is we don't have a clear suffix (first entry for every group is without a suffix) for pd.wide_to_long , hence modified the columns first and then applied pd.wide_to_long:

m  = df.set_index(['id','Name'])
cols = m.columns.str.replace('\d+','').to_series()
m.columns=cols.add(cols.groupby(cols).cumcount().astype(str))

out = (pd.wide_to_long(m.reset_index(),['code','type'],['id','Name'],'idx')
                                              .reset_index(['id','Name']))

     id Name  code type
idx                    
0     0  ABC   1.0    S
1     0  ABC   4.0    E
2     0  ABC   8.0    T
0     1  XYZ   2.0    R
1     1  XYZ   NaN  NaN
2     1  XYZ   5.0    U

Code to reproduce the dataframe:

d = {'id': {0: 0, 1: 1},
  'Name': {0: 'ABC', 1: 'XYZ'},
  'code': {0: 1, 1: 2},
  'code1': {0: 4.0, 1: np.nan},
  'code2': {0: 8, 1: 5},
  'type': {0: 'S', 1: 'R'},
  'type1': {0: 'E', 1: np.nan},
  'type2': {0: 'T', 1: 'U'}}
df = pd.DataFrame(d)
anky
  • 74,114
  • 11
  • 41
  • 70
  • I am unable to understand the logic about the name of columns and how you are separating it. The original columns are ID, Name, Attributes.ProducerCode.ProducerCode, Attributes.ProducerCode1.ProducerCode, Attributes.ProducerCode2.ProducerCode, Attributes.ProducerPartyRole.ProducerPartyRoleType, Attributes.ProducerPartyRole2.ProducerPartyRoleType, Attributes.ProducerPartyRole3.ProducerPartyRoleType. – Sai Astro Feb 28 '20 at 11:41
  • Btw, thank you so much for your patience. You are saving my butt at work! ( deadlines argh ! ) – Sai Astro Feb 28 '20 at 11:41
  • For wide to long to work, we would need the suffixes as numeric in an increasing manner, check how the column of m are before the wide to long line – anky Feb 28 '20 at 12:01
  • Hi, I tried ! no output. Check my picture, sorry cant take snippet. tinyurl.com/widelongso as you can see I renamed the columns too ! – Sai Astro Feb 28 '20 at 12:42
0

After few frustrating hours, I was able to complete this using a dirty trick.

I split the dataset in two.

id, name, code1, code2, code3

id, name, type1, type2, type3

I transposed the two dataframes

Example of first dataframe.

id, name, code1

id, name, code2

id, name, code3

Using the pd.concat([df1.df2], axix=1) I was able to concatenate it and removed NULL values and duplicate column names

FINAL DATAFRAME

id, name, code1, type1

id, name, code2, type2

id, name, code3, type3

Sai Astro
  • 113
  • 1
  • 7