0

I keep getting AttributeError: 'DataFrame' object has no attribute 'column' when I run the function on a column in a dataframe

def reform (column, dataframe): 
    if dataframe.column.nunique() > 2 and dataframe.column.dtypes == object:
        enc.fit(dataframe[['column']])
        enc.categories_
        onehot = enc.transform(dataframe[[column]]).toarray()
        dataframe[enc.categories_] = onehot
    elif dataframe.column.nunique() == 2 and dataframe.column.dtypes == object :
        le.fit_transform(dataframe[['column']])
    else:
        print('Column cannot be reformed')
    return dataframe
Jan Wilamowski
  • 3,308
  • 2
  • 10
  • 23

1 Answers1

0

Try changing

  • dataframe.column to dataframe.loc[:,column].
  • dataframe[['column']] to dataframe.loc[:,[column]]

For more help, please provide more information. Such as: What is enc (show your imports)? What does dataframe look like (show a small example, perhaps with dataframe.head(5))?

Details: Since column is an input (probably a string), you need to use it correctly when asking for that column from the dataframe object. If you just use dataframe.column it will try to find the column actually named 'column', but if you ask for it dataframe.loc[:,column], it will use the string that is represented by the input parameter named column.

  • With dataframe.loc[:,column], you get a Pandas Series, and with dataframe.loc[:,[column]] you get a Pandas DataFrame.

  • The pandas attribute 'columns', used as dataframe.columns (note the 's' at the end) just returns a list of the names of all columns in your dataframe, probably not what you want here.

TIPS:

  • Try to name input parameters so that you know what they are.

  • When developing a function, try setting the input to something static, and iterate the code until you get desired output. E.g.

      input_df = my_df
      column_name = 'some_test_column'
      if input_df.loc[:,column_name].nunique() > 2 and input_df.loc[:,column_name].dtypes == object:
          enc.fit(input_df.loc[:,[column_name]])
          onehot = enc.transform(input_df.loc[:,[column_name]]).toarray()
          input_df.loc[:, enc.categories_] = onehot
      elif input_df.loc[:,column_name].nunique() == 2 and input_df.loc[:,column_name].dtypes == object :
          le.fit_transform(input_df.loc[:,[column_name]])
      else:
          print('Column cannot be transformed')
    
  • Look up on how to use SciKit Learn Pipelines, with ColumnTransformer. It will help make the workflow easier (https://scikit-learn.org/stable/modules/compose.html).

Magnus Persson
  • 823
  • 1
  • 7
  • 22