1

I have a dataframe:

df = key1 key2 .. keyn type  val1 val2 .. valn
      k1   k2      kn   p1    1     2      7
      k1   k2      kn   p2    6     1      5
      k1   k2      kn   p3    8     4      1
      k3   k2      kn   p1    4     6      9
      k3   k2      kn   p2    6     1      0
      k3   k2      kn   p3    1     2      8

So , foreach set of keys key..keyn I have 3 values in the column type. I want to unmelt it to columns to have a column per <type, val> pairing so I will end up with:

df = key1 key2 .. keyn val1_typep1 val1_typep2 val1_typep3 ..  valn_typep1 valn_typep2 valn_typep3  
      k1   k2      kn       1          6            8               7           5           1 
      k3   k2      kn       4          6            1               9           0           8 

What is the bst way to do so?

Cranjis
  • 1,590
  • 8
  • 31
  • 64

3 Answers3

1

You can filter the key like columns then pivot to reshape the dataframe with index as keys and columns as type, finally flatten the multiindex columns using map + .join:

# filter the key like columns
keys = df.filter(like='key').columns

# pivot the dataframe on keys and type
pvt = df.pivot(index=list(keys), columns='type')

# Flatten the multiindex columns
# by joining around seperator _type
pvt.columns = pvt.columns.map('_type'.join)
pvt = pvt.reset_index()

>>> pvt

  key1 key2 keyn  val1_typep1  val1_typep2  val1_typep3  val2_typep1  val2_typep2  val2_typep3  valn_typep1  valn_typep2  valn_typep3
0   k1   k2   kn            1            6            8            2            1            4            7            5            1
Shubham Sharma
  • 68,127
  • 6
  • 24
  • 53
  • I get the error "{ValueError}The name type occurs multiple times, use a level number", any idea? (just for clarification, I have many triplets, I just put one as example. Added another triplet to the original post – Cranjis Feb 07 '21 at 09:10
  • @okuoub Checked the code with your update example and its working fine for me. btw which pandas version are you using? – Shubham Sharma Feb 07 '21 at 09:15
  • @Shubam Sharma version 1.1, any idea why I get the error? – Cranjis Feb 07 '21 at 09:41
  • @okuoub Not sure..seems like bug but i will have to check again..In the meantime can you try `pivot_table` instead of `pivot` to see if it works? – Shubham Sharma Feb 07 '21 at 09:53
1

simply use stack and unstack from pandas.

with my understanding, use unstack to turn the inner index's contents to columns.

As using stack to turn the columns to the inner index's contents.

# sample data
data = [{'key1': 'k1', 'key2': 'k2', 'keyn': 'kn', 'type': 'p1', 'val1': 1, 'val2': 2, 'valn': 7}, {'key1': 'k1', 'key2': 'k2', 'keyn': 'kn', 'type': 'p2', 'val1': 6, 'val2': 1, 'valn': 5}, {'key1': 'k1', 'key2': 'k2', 'keyn': 'kn', 'type': 'p3', 'val1': 8, 'val2': 4, 'valn': 1}, {'key1': 'k3', 'key2': 'k2', 'keyn': 'kn', 'type': 'p1', 'val1': 4, 'val2': 6, 'valn': 9}, {'key1': 'k3', 'key2': 'k2', 'keyn': 'kn', 'type': 'p2', 'val1': 6, 'val2': 1, 'valn': 0}, {'key1': 'k3', 'key2': 'k2', 'keyn': 'kn', 'type': 'p3', 'val1': 1, 'val2': 2, 'valn': 8}]
df = pd.DataFrame(data)

# stack the df, with index `['key1', 'key2', 'keyn', 'type']`
cols = df.columns[df.columns.str.startswith('key')].tolist()
cols.append('type')
# print(cols) # ['key1', 'key2', 'keyn', 'type']
dfn = df.set_index(cols).loc[:, 'val1':'valn'].stack().reset_index()
# print(dfn)

# create col_name as `valn_typep1`
dfn['col_name'] = dfn.iloc[:,-2] + '_type'  +  dfn.iloc[:,-3] 

# set index with `['key1', 'key2', 'keyn', 'col_name']`, value is column 0, and unstack, transfer index col_name to columns
cols = df.columns[df.columns.str.startswith('key')].tolist()
cols.append('col_name')
# print(cols) # ['key1', 'key2', 'keyn', 'col_name']
df_result = dfn.set_index(cols)[0].unstack().reset_index()
# print(df_result)

result:

print(dfn)

       key1 key2 keyn type level_4  0
    0    k1   k2   kn   p1    val1  1
    1    k1   k2   kn   p1    val2  2
    2    k1   k2   kn   p1    valn  7
    3    k1   k2   kn   p2    val1  6
    4    k1   k2   kn   p2    val2  1
    5    k1   k2   kn   p2    valn  5
    6    k1   k2   kn   p3    val1  8
    7    k1   k2   kn   p3    val2  4
    8    k1   k2   kn   p3    valn  1
    9    k3   k2   kn   p1    val1  4
    10   k3   k2   kn   p1    val2  6
    11   k3   k2   kn   p1    valn  9
    12   k3   k2   kn   p2    val1  6
    13   k3   k2   kn   p2    val2  1
    14   k3   k2   kn   p2    valn  0
    15   k3   k2   kn   p3    val1  1
    16   k3   k2   kn   p3    val2  2
    17   k3   k2   kn   p3    valn  8

print(df_result)

    col_name key1 key2 keyn  val1_typep1  val1_typep2  val1_typep3  val2_typep1  \
    0          k1   k2   kn            1            6            8            2   
    1          k3   k2   kn            4            6            1            6   
    
    col_name  val2_typep2  val2_typep3  valn_typep1  valn_typep2  valn_typep3  
    0                   1            4            7            5            1  
    1                   1            2            9            0            8 

Ferris
  • 5,325
  • 1
  • 14
  • 23
0

use melt id_vars for the the index fields and vars for the fields to melt then sort the results by the index fields

data = [{'key1': 'k1', 'key2': 'k2', 'keyn': 'kn', 'type': 'p1', 'val1': 1, 'val2': 2, 'valn': 7}, {'key1': 'k1', 'key2': 'k2', 'keyn': 'kn', 'type': 'p2', 'val1': 6, 'val2': 1, 'valn': 5}, {'key1': 'k1', 'key2': 'k2', 'keyn': 'kn', 'type': 'p3', 'val1': 8, 'val2': 4, 'valn': 1}, {'key1': 'k3', 'key2': 'k2', 'keyn': 'kn', 'type': 'p1', 'val1': 4, 'val2': 6, 'valn': 9}, {'key1': 'k3', 'key2': 'k2', 'keyn': 'kn', 'type': 'p2', 'val1': 6, 'val2': 1, 'valn': 0}, {'key1': 'k3', 'key2': 'k2', 'keyn': 'kn', 'type': 'p3', 'val1': 1, 'val2': 2, 'valn': 8}]
df = pd.DataFrame(data)
print(df)
results=pd.melt(df,value_vars=['val1','val2','valn'],id_vars=['key1','key2','keyn','type'])
print(results.sort_values(by=['key1','key2','keyn','type']))


output:
   key1 key2 keyn type variable  value
0    k1   k2   kn   p1     val1      1
6    k1   k2   kn   p1     val2      2
12   k1   k2   kn   p1     valn      7
1    k1   k2   kn   p2     val1      6
7    k1   k2   kn   p2     val2      1
13   k1   k2   kn   p2     valn      5
2    k1   k2   kn   p3     val1      8
8    k1   k2   kn   p3     val2      4
14   k1   k2   kn   p3     valn      1
3    k3   k2   kn   p1     val1      4
9    k3   k2   kn   p1     val2      6
15   k3   k2   kn   p1     valn      9
4    k3   k2   kn   p2     val1      6
10   k3   k2   kn   p2     val2      1
16   k3   k2   kn   p2     valn      0
5    k3   k2   kn   p3     val1      1
11   k3   k2   kn   p3     val2      2
17   k3   k2   kn   p3     valn      8

#reverse the melt

fp=results.pivot(index=['key1','key2','keyn','type'],columns= 
['variable'],values=['value'])
#fp = fp[(fp.T != 0).any()]
print(fp)

value          
variable             val1 val2 valn
key1 key2 keyn type                
k1   k2   kn   p1       1    2    7
               p2       6    1    5
               p3       8    4    1
k3   k2   kn   p1       4    6    9
               p2       6    1    0
               p3       1    2    8

Golden Lion
  • 3,840
  • 2
  • 26
  • 35