-1

I'm trying to apply some machine learning based regression on data from a CSV file. My columns are:

Index(['date', 'customer_id', 'product_category', 'payment_method',
       'value [USD]', 'time_on_site', 'clicks_in_site', 'USD/[Minutes]',
       'USD/clicks_in_site'],
      dtype='object')

When I run:

from pycaret.regression import * 
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

exp_reg = setup(data = df, target='value [USD]', session_id=123,
             high_cardinality_features = ['product_category'],
             normalize = True,
             ignore_features = ['customer_id', 'date', 'time_on_site']
             )

I get the following error message:

KeyError                                  Traceback (most recent call last)
<ipython-input-43-20eab85de0cc> in <module>()
      2              high_cardinality_features = ['product_category'],
      3              normalize = True,
----> 4              ignore_features = ['customer_id', 'date', 'time_on_site']
      5              )
      6 

5 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in drop(self, labels, errors)
   5285         if mask.any():
   5286             if errors != "ignore":
-> 5287                 raise KeyError(f"{labels[mask]} not found in axis")
   5288             indexer = indexer[~mask]
   5289         return self.delete(indexer)

KeyError: "['value [USD]'] not found in axis"
smci
  • 32,567
  • 20
  • 113
  • 146
Lehas
  • 31
  • 4
  • 2
    Edit your question to add the missing `import` statements so we can see which module `setup()` comes from. SO requires you to post [mcve] ('MCVE'). – smci May 19 '21 at 08:26
  • thanks, put the imports in it! – Lehas May 19 '21 at 08:32
  • Thanks, we only need the absolute minimum number of imports to run your code example. Also, this question should be tagged [tag:python], gets more eyeballs on your question. This one doesn't really need [tag:pandas] tag. – smci May 19 '21 at 09:00

1 Answers1

0

I found the solution. The column name ['value [USD]'] was the problem. After renaming it the code works as intended. It has probably something to do with the brackets inside the column name which can maybe be interpreted as a dictionary or list but I'm not sure.

Lehas
  • 31
  • 4
  • Sounds like a bug on pycaret package. Please check their [github issues](https://github.com/pycaret/pycaret/issues) and raise a new one if this isn't already listed. See e.g. [#247](https://github.com/pycaret/pycaret/issues/247) – smci May 19 '21 at 08:59
  • Meantime the workaround is obvious. Useful to title this something like 'pycaret can't index column names containing brackets'. – smci May 19 '21 at 09:06