1

I have a data-frame like as shown below

df = pd.DataFrame({
'subject_ID':[1,2,3,4,5],
'date_visit':['1/1/2020','3/3/2200','13/11/2100','24/05/2198','30/03/2071'],
'a11fever':['Yes','No','Yes','Yes','No'],
'a12diagage':[36,34,42,40,np.nan],
'a12diagyr':[2021,3213,2091,4567,8901],
'a12diagyrago':[6,np.nan,9,np.nan,np.nan]})

I would like to transform the dataframe where the sample output for one subject looks like as shown below

enter image description here

Though I was able to do this successfully using pd.melt and stack, I couldn't do the same using wide_long.

pd.melt(df, id_vars =['subject_ID','date_visit'], value_vars =['a11fever', 'a12diagage', 'a12diagyr','a12diagyrago'])  # works fine

pd.wide_to_long(df, stubnames=['measurement', 'val'],i=(['subject_ID','date_visit']), j='grp').sort_index(level=0) # returns 0 records

df.set_index(['subject_ID','date_visit']).stack().reset_index() #works fine

another question I have is,

a) Do we always have to mention all the column names that we would like to transform under value_vars section of pd.melt. My real data will have more than 120 columns. So do I have to mention all of them here?

Can you also help me with this on how to do it using wide_long?

The Great
  • 7,215
  • 7
  • 40
  • 128

2 Answers2

5

Do we always have to mention all the column names that we would like to transform under value_vars section of pd.melt. My real data will have more than 120 columns. So do I have to mention all of them here?

No, not necessary, if omit parameter value_vars then are used all columns without used for id_vars:

df = pd.melt(df, id_vars =['subject_ID','date_visit'])
print (df)

    subject_ID  date_visit      variable value
0            1    1/1/2020      a11fever   Yes
1            2    3/3/2200      a11fever    No
2            3  13/11/2100      a11fever   Yes
3            4  24/05/2198      a11fever   Yes
4            5  30/03/2071      a11fever    No
5            1    1/1/2020    a12diagage    36
6            2    3/3/2200    a12diagage    34
7            3  13/11/2100    a12diagage    42
8            4  24/05/2198    a12diagage    40
9            5  30/03/2071    a12diagage   NaN
10           1    1/1/2020     a12diagyr  2021
11           2    3/3/2200     a12diagyr  3213
12           3  13/11/2100     a12diagyr  2091
13           4  24/05/2198     a12diagyr  4567
14           5  30/03/2071     a12diagyr  8901
15           1    1/1/2020  a12diagyrago     6
16           2    3/3/2200  a12diagyrago   NaN
17           3  13/11/2100  a12diagyrago     9
18           4  24/05/2198  a12diagyrago   NaN
19           5  30/03/2071  a12diagyrago   NaN
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
3

This is not a usecase for pd.wide_to_long since it will generate incorrect output. You have to use stubnames and these will be transformed to columns (a11 & a12). See example:

melt = pd.wide_to_long(df, 
                i=['subject_ID', 'date_visit'], 
                stubnames=['a11', 'a12'], 
                suffix='\D+', 
                j='fever_diag').reset_index()
    subject_ID  date_visit fever_diag  a11     a12
0            1    1/1/2020    diagage  NaN    36.0
1            1    1/1/2020     diagyr  NaN  2021.0
2            1    1/1/2020  diagyrago  NaN     6.0
3            1    1/1/2020      fever  Yes     NaN
4            2    3/3/2200    diagage  NaN    34.0
5            2    3/3/2200     diagyr  NaN  3213.0
6            2    3/3/2200  diagyrago  NaN     NaN
7            2    3/3/2200      fever   No     NaN
8            3  13/11/2100    diagage  NaN    42.0
9            3  13/11/2100     diagyr  NaN  2091.0
10           3  13/11/2100  diagyrago  NaN     9.0
11           3  13/11/2100      fever  Yes     NaN
12           4  24/05/2198    diagage  NaN    40.0
13           4  24/05/2198     diagyr  NaN  4567.0
14           4  24/05/2198  diagyrago  NaN     NaN
15           4  24/05/2198      fever  Yes     NaN
16           5  30/03/2071    diagage  NaN     NaN
17           5  30/03/2071     diagyr  NaN  8901.0
18           5  30/03/2071  diagyrago  NaN     NaN
19           5  30/03/2071      fever   No     NaN
Erfan
  • 40,971
  • 8
  • 66
  • 78
  • Okay. Understand. I was thinking that `stack`,`melt`,`wide_long` can help us achieve identical output. Am I wrong? upvoted – The Great Nov 06 '19 at 12:44
  • Well it completely depends on what your output format has to be. I tried to show the usecase of `wide_to_long`, it takes stubnames (prefixes) of certain columns and returns the suffixes as values in a different column. Hope this clears it up. – Erfan Nov 06 '19 at 12:52
  • 1
    Thank you for the answer. Much appreciated. marked as answer – The Great Nov 06 '19 at 13:05