8

I'm trying to reshape my pd dataframe with the following function:

 ar = ar.pivot(index='Received', columns='Merch Ref', values='acceptance_rate')

The dataset looks like:

     Merch Ref            Received  acceptance_rate
0           SF 2014-08-28 15:38:00                0
1           SF 2014-08-28 15:44:00                0
2           SF 2014-08-28 16:04:00                0
3           WF 2014-08-28 16:05:00                0
4           WF 2014-08-28 16:07:00                0
5           SF 2014-08-28 16:34:00                0
6           SF 2014-08-28 16:55:00                0
7           BF 2014-08-28 17:59:00                0
8           BF 2014-08-29 15:05:00                0
9           SF 2014-08-29 21:25:00                0
10          SF 2014-08-30 10:29:00                0
...

What I'd like to obtain is:

                      SF WF BF 
2014-08-28 15:38:00    0  1  0
2014-08-28 15:44:00    0  1  0
2014-08-28 16:04:00    0  0  1
2014-08-28 16:05:00    1  1  0
2014-08-28 16:07:00    0  0  1
2014-08-28 16:34:00    1  1  0
2014-08-28 16:55:00    1  1  0
2014-08-28 17:59:00    0  1  0
2014-08-29 15:05:00    0  0  1
2014-08-29 21:25:00    0  0  1 
2014-08-30 10:29:00    0  1  0

However, I get the error:

 ValueError: Index contains duplicate entries, cannot reshape

This is because i have some orders placed at the same time. Is there a way to sum/aggregate these orders ?

Blue Moon
  • 4,421
  • 20
  • 52
  • 91
  • I tried with: ar = ar.pivot_table(ar, values='acceptance_rate', index='Received', columns='Merch Ref') and I got: TypeError: pivot_table() got multiple values for keyword argument 'values' – Blue Moon Aug 03 '15 at 11:05
  • 2
    `ar.pivot_table(values='acceptance_rate', index='Received', columns='Merch Ref') ` – HYRY Aug 03 '15 at 11:09
  • it worked. great .thanks. You can put it in as answer so that I can close the question – Blue Moon Aug 03 '15 at 11:11

3 Answers3

2

Try to remove duplicate:

ar = ar.drop_duplicates(['Received','Merch Ref'])

it should work

Farid
  • 169
  • 1
  • 8
1

As you identified, the error occurs from duplicates in pairs (x, y) for x in Received and y in Merch Ref.

If you would like to aggregate by sum then

ar.pivot_table(index='Received', columns='Merch Ref',
               values='acceptance_rate', aggfunc=np.sum)

. The default aggregation function is mean. That is,

ar.pivot_table(index='Received', columns='Merch Ref',
               values='acceptance_rate')

, will pivot the table and all entries with the same (x, y) pair will be aggregated with the np.mean function.

Remark: I initially received the same error, but after iterating through the (x, y) pairs I didn't find any duplicates. It turns out some of the pairs were of the form (nan, nan) and were omitted from the iteration process. Thus for other users trying to debug what they believe are unique pairs, consider checking for nans with pd.isnull or pd.notnull.

timctran
  • 505
  • 4
  • 10
0

You should use crosstab.

Example:

pd.crosstab(index=ar.Recieved, columns=ar['Merch Ref'], aggfunc='count')
Amnon
  • 2,212
  • 1
  • 19
  • 35