0

I am trying to follow a tutorial to perform Anomaly Detection in PyCaret. While running setup(), I keep getting this error:

AttributeError: 'DataFrame' object has no attribute 'unique'

This error happens anytime I have a categorical variable in the dataset. Below is the code I used and a screenshot of the error:

data = pd.read_csv('https://raw.githubusercontent.com/numenta/NAB/master/data/realKnownCause/nyc_taxi.csv')
data['timestamp'] = pd.to_datetime(data['timestamp'])

data.set_index('timestamp', drop=True, inplace=True)
# resample timeseries to hourly 
data = data.resample('H').sum()
# creature features from date
data['day'] = [i.day for i in data.index]
data['day_name'] = [i.day_name() for i in data.index]
data['day_of_year'] = [i.dayofyear for i in data.index]
data['week_of_year'] = [i.weekofyear for i in data.index]
data['hour'] = [i.hour for i in data.index]
data['is_weekday'] = [i.isoweekday() for i in data.index]

s = setup(data, session_id = 123)
```[enter image description here][1]


  [1]: https://i.stack.imgur.com/LOWMn.png
vraka0723
  • 5
  • 2

2 Answers2

0

Yes, the error is clear. The dataframe does not have a 'unique' column. You can check that out by:

data = pd.read_csv('https://raw.githubusercontent.com/numenta/NAB/master/data/realKnownCause/nyc_taxi.csv')
data.head()

output

    timestamp   value
0   2014-07-01 00:00:00 10844
1   2014-07-01 00:30:00 8127
2   2014-07-01 01:00:00 6210
3   2014-07-01 01:30:00 4656
4   2014-07-01 02:00:00 3820
Hamzah
  • 8,175
  • 3
  • 19
  • 43
  • Wouldn't that give a KeyError? So far as I can see pandas DataFrame doesn't have a unique method (series does). For DataFrames there's drop_duplicates. – s_pike Sep 20 '22 at 12:47
  • No, there is no KeyError. Yes, there is no 'unique' attribute. – Hamzah Sep 20 '22 at 12:51
  • If you try and access a column that doesn't exist pandas throws a KeyError – s_pike Sep 20 '22 at 12:56
0

It looks like a bug in the source code. You could try raising it on GitHub.

You also could monkey patch the data frame:

import pandas as pd

pd.DataFrame.unique = pd.DataFrame.drop_duplicates
s_pike
  • 1,710
  • 1
  • 10
  • 22