2

In my machine, the code can run normally. But in my friend's machine, there is an error about drop_duplicates(). The error type is the same as the title.

Wzh
  • 21
  • 1
  • 3

3 Answers3

2

Open your command prompt, type pip show pandas to check the current version of your pandas. If it's lower than 1.0.0 as @paulperry says, then type pip install --upgrade pandas --user (substitute user with your windows account name)

Anand Vidvat
  • 977
  • 7
  • 20
yxykoo
  • 21
  • 3
  • All great, except don't substitute **--user** for anything. That tells it to use the current user. Putting your user name will just confuse it. – JonnyCab Oct 12 '21 at 13:42
0

Type import pandas as pd; pd.__version__ and see what version of Pandas you are using and make sure it's >= 1.0 .

paulperry
  • 826
  • 8
  • 16
  • To me the same error happens if I do `df["col_name"].drop_duplicates(keep="first", inplace=False, ignore_index=False)`. And it works as expected with `df.drop_duplicates("col_name", keep="first", inplace=False, ignore_index=False)`. That's because pd.Series' version of drop_duplicates doesn't expect ignore_index argument, while pd.DataFrame's one does. `pd.__version__` is `1.0.5` – Ilya Chernov Nov 06 '20 at 16:37
0

I was having the same problem as Wzh -- but am running pandas version 1.1.3. So, it was not a version problem.

Ilya Chernov's comment pointed me in the right direction. I needed to extract a list of unique names from a single column in a more complicated DataFrame so that I could use that list in a lookup table. This seems like something others might need to do, so I will expand a bit on Chernov's comment with this example, using the sample csv file "iris.csv" that isavailable on GitHub. The file lists sepal and petal length for a number of iris varieties. Here we extract the variety names.

df = pd.read_csv('iris.csv')

# drop duplicates BEFORE extracting the column
names = df.drop_duplicates('variety', inplace=False, ignore_index=True)

# THEN extract the column you want
names = names['variety']
print(names)

Here is the output:

0        Setosa
1    Versicolor
2     Virginica
Name: variety, dtype: object

The key idea here is to get rid of the duplicate variety names while the object is still a DataFrame (without changing the original file), and then extract the one column that is of interest.

billzoel
  • 1
  • 2