8

let's say I have the following table of customer data

df = pd.DataFrame.from_dict({"Customer":[0,0,1], 
        "Date":['01.01.2016', '01.02.2016', '01.01.2016'], 
        "Type":["First Buy", "Second Buy", "First Buy"], 
        "Value":[10,20,10]})

which looks like this:

Customer |   Date   |   Type   |   Value
-----------------------------------------
       0 |01.01.2016|First Buy |     10 
-----------------------------------------
       0 |01.02.2016|Second Buy|     20 
-----------------------------------------
       1 |01.01.2016|First Buy |     10 

I want to pivot the table by the Type column. However, the pivoting only gives the numeric Value columns as a result. I'd desire a structure like:

 Customer | First Buy Date | First Buy Value | Second Buy Date | Second Buy Value
---------------------------------------------------------------------------------

where the missing values are NAN or NAT Is this possible using pivot_table. If not, I can imagine some workarounds, but they are quite lenghty. Any other suggestions?

Quickbeam2k1
  • 5,287
  • 2
  • 26
  • 42

1 Answers1

8

Use unstack:

df1 = df.set_index(['Customer', 'Type']).unstack()
df1.columns = ['_'.join(cols) for cols in df1.columns]
print (df1)
         Date_First Buy Date_Second Buy  Value_First Buy  Value_Second Buy
Customer                                                                  
0            01.01.2016      01.02.2016             10.0              20.0
1            01.01.2016            None             10.0               NaN

If need another order of columns use swaplevel and sort_index:

df1 = df.set_index(['Customer', 'Type']).unstack()

df1.columns = ['_'.join(cols) for cols in df1.columns.swaplevel(0,1)]
df1.sort_index(axis=1, inplace=True)
print (df1)
         First Buy_Date  First Buy_Value Second Buy_Date  Second Buy_Value
Customer                                                                  
0            01.01.2016             10.0      01.02.2016              20.0
1            01.01.2016             10.0            None               NaN
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Wow, really beautiful. The indexing features together with stacking are very neat. – Quickbeam2k1 Aug 10 '16 at 07:48
  • To rename the columns, is it possible to rename the columns using method chaining? – Quickbeam2k1 Aug 10 '16 at 07:56
  • sorry, I don't understand what you mean? I think method chaining could work using .pipe, a lambda function and either of your list comprehensions. Wondering if rename or rename_axis could work here, too – Quickbeam2k1 Aug 10 '16 at 08:01
  • Do you need `df1.rename(columns={'First Buy_Date': 'another_col1', 'Second Buy_Date': 'another_col2'})` ? – jezrael Aug 10 '16 at 08:07
  • I think this might work, though it will get tedious for lots of columns. However, the dict, might be generated from your list comprehension. My first attempt at piping failed, since you can't assign inside a lambda function. However, many thanks – Quickbeam2k1 Aug 10 '16 at 08:11
  • Another solution can be replace values in column `Type` before `unstack` - `df['Type'] = df.Type.map({'First Buy': 'a', 'Second Buy': 'b'})`, but need all values in `dict`, else you get `NaN`. – jezrael Aug 10 '16 at 08:13
  • I see, and also how this works when chaining. I just want to clearify my previous comment: Your naming scheme would work perfectly, particularly since there are many columns in my use case – Quickbeam2k1 Aug 10 '16 at 08:15
  • Hmmm, if in column names is no pattern, you need use all column names, you are right. – jezrael Aug 10 '16 at 08:17
  • 1
    @jezrael here is your solution as an obnoxious one liner with chaining. `df.set_index(['Customer', 'Type']).unstack().sort_index(1, 1).T.set_index(df.set_index(['Customer', 'Type']).unstack().sort_index(1, 1).columns.to_series().str[::-1].str.join('_')).T` – piRSquared Aug 10 '16 at 08:17
  • wau, it is crazy ;) Thank you @piRSquared – jezrael Aug 10 '16 at 08:19