Your problem is solved by turning off the sort property of groupby.
df1 = pd.DataFrame({"a":[0,100,200], "by":["C","B","B"]}, index=[0,1,1])
df1.groupby("by", sort=False).diff()
print(df1)
Result:
a by
0 0 C
1 100 B
1 200 B
Explanation:
Even if you "cannot reindex from a duplicate axis", Pandas tries to do it by assigning a rank to letters by their alphabetical order when the sort property is activated, for instance :
A ---> 1
B ---> 2
B ---> 3
Even if we have 2 B's the incrementation is possible by considering the second B comes logically after the first B.
For example the chunks of code below works perfectly:
import pandas as pd
# THE CODE BELOW WORKS PERFECTLY
df1 = pd.DataFrame({"a":[0,100,90], "by":["A","B","B"]}, index=[0,1,1])
df1.groupby("by").diff()
print(df1)
df1 = pd.DataFrame({"a":[0,100,90], "by":["B","C","C"]}, index=[0,1,1])
df1.groupby("by").diff()
print(df1)
df1 = pd.DataFrame({"a":[0,100,90], "by":["C","D","D"]}, index=[0,1,1])
df1.groupby("by").diff()
print(df1)
Because D comes after C, C comes after B and so on...
Pandas tries to find a logic
what is illogical considering the alphabetical order is this:
DCC ---> You could not assign 1 to D thus 2 to C.
Chunks of code below generate errors:
# EVERY CHUNK OF CODE BELOW GENERATES AN ERROR
df1 = pd.DataFrame({"a":[0,100,90], "by":["B","A","A"]}, index=[0,1,1])
df1.groupby("by").diff()
print(df1)
# builtins.ValueError: cannot reindex from a duplicate axis
df1 = pd.DataFrame({"a":[0,100,90], "by":["D","C","C"]}, index=[0,1,1])
df1.groupby("by").diff()
print(df1)
# builtins.ValueError: cannot reindex from a duplicate axis
df1 = pd.DataFrame({"a":[0,100,90], "by":["E","D","D"]}, index=[0,1,1])
df1.groupby("by").diff()
print(df1)
# builtins.ValueError: cannot reindex from a duplicate axis
To go further:
Let's consider these 2 chunks and their results:
df1 = pd.DataFrame({"a":[0,100,200], "by":["E","D","F"]}, index=[0,1,1])
df1.groupby("by").diff()
print(df1)
# builtins.ValueError: cannot reindex from a duplicate axis
with only a change on index...
df1 = pd.DataFrame({"a":[0,100,200], "by":["E","D","F"]}, index=[0,1,2])
df1.groupby("by").diff()
print(df1)
# a by
# 0 0 E
# 1 100 D
# 2 200 F
Even if EDF is not the alphabetical order, Pandas seems to go for a sort logic by using the index...
index is 011 in the first case with no sorting logic
that's not the case with 012
In conclusion you have to desactivate sorting by turning it to False property to prevent Pandas sorting attempts