First we start with set_index
and stack()
.
We then extract the test value and place it with its own column and reset the index:
s = df.set_index(['id','name'],append=True).stack().to_frame('tests')
s['values'] = s['tests'].str.extract('(\d+)')[0]
df1 = s.dropna(subset=['values']).reset_index([1,2,3])
df1['tests'] = df1['tests'].str.replace('\(.*\)','')
print(df1)
id name level_3 tests values
0 1 abc sec1 mp 6
0 1 abc sec2 up 7
0 1 abc sec3 sp 8
0 1 abc sec4 cp 7
Okay, now we need to something very similar to a cross join in SQL, that is repeat the dataframe according the len
of the slots and add each value accordingly.
Note, this is making a product of your dataframe, a better way would be to use merge but you've not made your input clear, as in how multiple rows will appear.
Let's try this with pd.concat
making use of the keys
argument:
vals = s.loc[s['values'].isna(),'tests'].values
final = pd.concat([df1 ]* len(vals),0,keys=vals)\
.reset_index(0).rename(columns={'level_0' : 'slots'})
print(final[['id','name','tests','values','slots']])
id name tests values slots
0 1 abc mp 6 2
0 1 abc up 7 2
0 1 abc sp 8 2
0 1 abc cp 7 2
0 1 abc mp 6 4
0 1 abc up 7 4
0 1 abc sp 8 4
0 1 abc cp 7 4
Avoiding a total product using merge
:
vals = s.loc[s['values'].isna(), 'tests'].reset_index([2,3],drop=True)\
.to_frame('slots').reset_index(1)
print(pd.merge(df1,vals,on=['id']))
id name level_3 tests values slots
0 1 abc sec1 mp 6 2
1 1 abc sec1 mp 6 4
2 1 abc sec2 up 7 2
3 1 abc sec2 up 7 4
4 1 abc sec3 sp 8 2
5 1 abc sec3 sp 8 4
6 1 abc sec4 cp 7 2
7 1 abc sec4 cp 7 4