ValueError: Index contains duplicate entries, cannot reshape

Question

I'm trying to reshape my pd dataframe with the following function:

 ar = ar.pivot(index='Received', columns='Merch Ref', values='acceptance_rate')

The dataset looks like:

     Merch Ref            Received  acceptance_rate
0           SF 2014-08-28 15:38:00                0
1           SF 2014-08-28 15:44:00                0
2           SF 2014-08-28 16:04:00                0
3           WF 2014-08-28 16:05:00                0
4           WF 2014-08-28 16:07:00                0
5           SF 2014-08-28 16:34:00                0
6           SF 2014-08-28 16:55:00                0
7           BF 2014-08-28 17:59:00                0
8           BF 2014-08-29 15:05:00                0
9           SF 2014-08-29 21:25:00                0
10          SF 2014-08-30 10:29:00                0
...

What I'd like to obtain is:

                      SF WF BF 
2014-08-28 15:38:00    0  1  0
2014-08-28 15:44:00    0  1  0
2014-08-28 16:04:00    0  0  1
2014-08-28 16:05:00    1  1  0
2014-08-28 16:07:00    0  0  1
2014-08-28 16:34:00    1  1  0
2014-08-28 16:55:00    1  1  0
2014-08-28 17:59:00    0  1  0
2014-08-29 15:05:00    0  0  1
2014-08-29 21:25:00    0  0  1 
2014-08-30 10:29:00    0  1  0

However, I get the error:

 ValueError: Index contains duplicate entries, cannot reshape

This is because i have some orders placed at the same time. Is there a way to sum/aggregate these orders ?

I tried with: ar = ar.pivot_table(ar, values='acceptance_rate', index='Received', columns='Merch Ref') and I got: TypeError: pivot_table() got multiple values for keyword argument 'values' — Blue Moon, Aug 03 '15 at 11:05
`ar.pivot_table(values='acceptance_rate', index='Received', columns='Merch Ref') ` — HYRY, Aug 03 '15 at 11:09
it worked. great .thanks. You can put it in as answer so that I can close the question — Blue Moon, Aug 03 '15 at 11:11

score 2 · Answer 1 · answered Feb 08 '19 at 03:00

2

Try to remove duplicate:

ar = ar.drop_duplicates(['Received','Merch Ref'])

it should work

answered Feb 08 '19 at 03:00

Farid

169
1
8

ar = ar.drop_duplicates(['Received'']) this was enough for me – kablanfatih Dec 21 '20 at 15:27

score 1 · Answer 2 · answered Aug 17 '17 at 17:56

As you identified, the error occurs from duplicates in pairs (x, y) for x in Received and y in Merch Ref.

If you would like to aggregate by sum then

ar.pivot_table(index='Received', columns='Merch Ref',
               values='acceptance_rate', aggfunc=np.sum)

. The default aggregation function is mean. That is,

ar.pivot_table(index='Received', columns='Merch Ref',
               values='acceptance_rate')

, will pivot the table and all entries with the same (x, y) pair will be aggregated with the np.mean function.

Remark: I initially received the same error, but after iterating through the (x, y) pairs I didn't find any duplicates. It turns out some of the pairs were of the form (nan, nan) and were omitted from the iteration process. Thus for other users trying to debug what they believe are unique pairs, consider checking for nans with pd.isnull or pd.notnull.

score 0 · Answer 3 · answered Jun 28 '20 at 16:54

0

You should use crosstab.

Example:

pd.crosstab(index=ar.Recieved, columns=ar['Merch Ref'], aggfunc='count')

answered Jun 28 '20 at 16:54

Amnon

2,212
1
19
35

ValueError: Index contains duplicate entries, cannot reshape

3 Answers3