drop_duplicates() got an unexpected keyword argument 'ignore_index'

Question

In my machine, the code can run normally. But in my friend's machine, there is an error about drop_duplicates(). The error type is the same as the title.

`pip install pandas -U` or `poetry update pandas` – DanielBell99 Apr 28 '22 at 10:06 — DanielBell99, Apr 28 '22 at 10:06

score 2 · Answer 1 · edited Apr 15 '21 at 06:26

2

Open your command prompt, type pip show pandas to check the current version of your pandas. If it's lower than 1.0.0 as @paulperry says, then type pip install --upgrade pandas --user (substitute user with your windows account name)

edited Apr 15 '21 at 06:26

Anand Vidvat

977
7
20

answered Apr 13 '21 at 09:45

yxykoo

21
3

All great, except don't substitute **--user** for anything. That tells it to use the current user. Putting your user name will just confuse it. – JonnyCab Oct 12 '21 at 13:42

score 0 · Answer 2 · answered May 06 '20 at 14:10

0

Type import pandas as pd; pd.__version__ and see what version of Pandas you are using and make sure it's >= 1.0 .

answered May 06 '20 at 14:10

paulperry

826
8
16

To me the same error happens if I do `df["col_name"].drop_duplicates(keep="first", inplace=False, ignore_index=False)`. And it works as expected with `df.drop_duplicates("col_name", keep="first", inplace=False, ignore_index=False)`. That's because pd.Series' version of drop_duplicates doesn't expect ignore_index argument, while pd.DataFrame's one does. `pd.__version__` is `1.0.5` – Ilya Chernov Nov 06 '20 at 16:37

billzoel · Answer 3 · 2020-12-22T21:22:07.513

I was having the same problem as Wzh -- but am running pandas version 1.1.3. So, it was not a version problem.

Ilya Chernov's comment pointed me in the right direction. I needed to extract a list of unique names from a single column in a more complicated DataFrame so that I could use that list in a lookup table. This seems like something others might need to do, so I will expand a bit on Chernov's comment with this example, using the sample csv file "iris.csv" that isavailable on GitHub. The file lists sepal and petal length for a number of iris varieties. Here we extract the variety names.

df = pd.read_csv('iris.csv')

# drop duplicates BEFORE extracting the column
names = df.drop_duplicates('variety', inplace=False, ignore_index=True)

# THEN extract the column you want
names = names['variety']
print(names)

Here is the output:

0        Setosa
1    Versicolor
2     Virginica
Name: variety, dtype: object

The key idea here is to get rid of the duplicate variety names while the object is still a DataFrame (without changing the original file), and then extract the one column that is of interest.

drop_duplicates() got an unexpected keyword argument 'ignore_index'

3 Answers3