1

I have a dataframe like:

    ID       Sim        Items          
    1        0.345      [7,7]            
    2        0.604      [2,7,3,8,5]      
    3        0.082      [9,1,9,1]

I want to form a pivot_table by:

df.pivot_table(index ="ID" , columns = "Items", values="Sim")

To do that, I have to extract list elements in items column and repeat the ID,Sim values for each unique elements in row_list. To be as:

    ID       Sim        Items          
    1        0.345        7  

    2        0.604        2 
    2        0.604        7  
    2        0.604        3  
    2        0.604        8  
    2        0.604        5

    3        0.082        9
    3        0.082        1

pivot table :

         7       2       3       8      5      1      9

    1  0.345     -       -       -      -      -      -
    2  0.604   0.604   0.604   0.604   0.604
    3    -      -        -       -     -     0.082  0.082        

Is there any pythonic approach for that? Or any suggestions?

nucsit026
  • 652
  • 7
  • 16

1 Answers1

2

Use explode(new in pandas 0.25+) before pivot;

df.explode('Items').pivot_table(index ="ID" , columns = "Items", values="Sim")

Items      1      2      3      5      7      8      9
ID                                                    
1        NaN    NaN    NaN    NaN  0.345    NaN    NaN
2        NaN  0.604  0.604  0.604  0.604  0.604    NaN
3      0.082    NaN    NaN    NaN    NaN    NaN  0.082

for lower versions of pandas, you can try with:

(df.drop('Items',1).join(pd.DataFrame(df['Items'].tolist())
 .stack(dropna=False).droplevel(1).rename('Items'))
  .pivot_table(index ="ID" , columns = "Items", values="Sim"))

Items      1      2      3      5      7      8      9
ID                                                    
1        NaN    NaN    NaN    NaN  0.345    NaN    NaN
2        NaN  0.604  0.604  0.604  0.604  0.604    NaN
3      0.082    NaN    NaN    NaN    NaN    NaN  0.082

If exact ordering matters , use reindex with unique of Items after explode:

(df.explode('Items').pivot_table(index ="ID" , columns = "Items", values="Sim")
       .reindex(df.explode('Items')['Items'].unique(),axis=1))

Items      7      2      3      8      5      9      1
ID                                                    
1      0.345    NaN    NaN    NaN    NaN    NaN    NaN
2      0.604  0.604  0.604  0.604  0.604    NaN    NaN
3        NaN    NaN    NaN    NaN    NaN  0.082  0.082
anky
  • 74,114
  • 11
  • 41
  • 70
  • pandas version 0.24.2 – nucsit026 Jan 18 '20 at 17:09
  • @nucsit026 I added a sample for pandas lower versions , if possible please [upgrade](https://stackoverflow.com/questions/22840449/how-to-update-pandas-from-anaconda-and-is-it-possible-to-use-eclipse-with-this-l) the version to access some cool features :) – anky Jan 18 '20 at 17:21
  • @anky-91 It result: `AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas` – nucsit026 Jan 18 '20 at 17:29
  • @nucsit026 cannot replicate that, it works fine for me, as said is it possoble to upgrade and try? – anky Jan 18 '20 at 17:30
  • 1
    @anky-91 Thanks again I will try. – nucsit026 Jan 18 '20 at 17:35
  • @nucsit026 check the edit for pandas lower versions, hope it works – anky Jan 18 '20 at 17:37