1

I have a table as

id   name  sec1   sec2   sec3  sec4  sec5  sec6
1    abc   mp(6)  up(7)  sp(8) cp(7)  2     4  

and i want output as

id   name   tests  values   slots
1    abc     mp     6        2
1    abc     up     7        2
1    abc     sp     8        2
1    abc     cp     7        2
1    abc     mp     6        4
1    abc     up     7        4
1    abc     sp     8        4
1    abc     cp     7        4

Can someone help me with normalising this with melt. Thanks in advance

naina
  • 303
  • 3
  • 14

2 Answers2

1

First we start with set_index and stack().

We then extract the test value and place it with its own column and reset the index:

s = df.set_index(['id','name'],append=True).stack().to_frame('tests')
s['values'] = s['tests'].str.extract('(\d+)')[0]
df1 = s.dropna(subset=['values']).reset_index([1,2,3])
df1['tests'] = df1['tests'].str.replace('\(.*\)','')

print(df1)
   id name level_3 tests values
0   1  abc    sec1    mp      6
0   1  abc    sec2    up      7
0   1  abc    sec3    sp      8
0   1  abc    sec4    cp      7

Okay, now we need to something very similar to a cross join in SQL, that is repeat the dataframe according the len of the slots and add each value accordingly.

Note, this is making a product of your dataframe, a better way would be to use merge but you've not made your input clear, as in how multiple rows will appear.

Let's try this with pd.concat making use of the keys argument:

vals = s.loc[s['values'].isna(),'tests'].values

final = pd.concat([df1 ]* len(vals),0,keys=vals)\
            .reset_index(0).rename(columns={'level_0' : 'slots'})

print(final[['id','name','tests','values','slots']])

   id name tests values  slots
0   1  abc    mp      6      2
0   1  abc    up      7      2
0   1  abc    sp      8      2
0   1  abc    cp      7      2
0   1  abc    mp      6      4
0   1  abc    up      7      4
0   1  abc    sp      8      4
0   1  abc    cp      7      4

Avoiding a total product using merge:

vals = s.loc[s['values'].isna(), 'tests'].reset_index([2,3],drop=True)\
                   .to_frame('slots').reset_index(1)


print(pd.merge(df1,vals,on=['id']))


   id name level_3 tests values slots
0   1  abc    sec1    mp      6     2
1   1  abc    sec1    mp      6     4
2   1  abc    sec2    up      7     2
3   1  abc    sec2    up      7     4
4   1  abc    sec3    sp      8     2
5   1  abc    sec3    sp      8     4
6   1  abc    sec4    cp      7     2
7   1  abc    sec4    cp      7     4
halfer
  • 19,824
  • 17
  • 99
  • 186
Umar.H
  • 22,559
  • 7
  • 39
  • 74
  • 1
    Nice answer its really tricky question to solve. Keep up the good work :) – Shubham Sharma Feb 25 '21 at 14:52
  • Really interesting question! I thought its worth trying out..Here is one solution I came up with if you would like to check it out [here is the link](https://repl.it/@ShubhamSharma13/BlissfulGruesomeEvent#main.py). And thanks for for letting me know about this question :-) – Shubham Sharma Feb 25 '21 at 15:10
  • 1
    I wasn't sure if it was tough or if I was misunderstanding the data structure, in SQL this would I think i could do this with fewer lines of code, but would struggle with regex extracts. nice solution :) @ShubhamSharma – Umar.H Feb 25 '21 at 15:14
0

One way about it is to melt twice, then extract the text and numbers into separate columns:

(
    df.melt(["id", "name", "sec5", "sec6"], var_name="var1", value_name="tests")
      .melt(["id", "name", "var1", "tests"], value_name="slots")
      .assign(values=lambda df: df.tests.str.extract(r"(\d)"),
             tests=lambda df: df.tests.str.extract(r"([a-z]+)"))
      .loc[:, ["id", "name", "tests", "values", "slots"]]
)

   id name tests values  slots
0   1  abc    mp      6      2
1   1  abc    up      7      2
2   1  abc    sp      8      2
3   1  abc    cp      7      2
4   1  abc    mp      6      4
5   1  abc    up      7      4
6   1  abc    sp      8      4
7   1  abc    cp      7      4
sammywemmy
  • 27,093
  • 4
  • 17
  • 31