A function for onehotencoding and labelencoding in a dataframe

Question

I keep getting AttributeError: 'DataFrame' object has no attribute 'column' when I run the function on a column in a dataframe

def reform (column, dataframe): 
    if dataframe.column.nunique() > 2 and dataframe.column.dtypes == object:
        enc.fit(dataframe[['column']])
        enc.categories_
        onehot = enc.transform(dataframe[[column]]).toarray()
        dataframe[enc.categories_] = onehot
    elif dataframe.column.nunique() == 2 and dataframe.column.dtypes == object :
        le.fit_transform(dataframe[['column']])
    else:
        print('Column cannot be reformed')
    return dataframe

just use `columns` instead of `column` – Kang San Lee Feb 21 '22 at 08:57 — Kang San Lee, Feb 21 '22 at 08:57

score 0 · Answer 1 · answered Feb 21 '22 at 10:04

Try changing

dataframe.column to dataframe.loc[:,column].
dataframe[['column']] to dataframe.loc[:,[column]]

For more help, please provide more information. Such as: What is enc (show your imports)? What does dataframe look like (show a small example, perhaps with dataframe.head(5))?

Details: Since column is an input (probably a string), you need to use it correctly when asking for that column from the dataframe object. If you just use dataframe.column it will try to find the column actually named 'column', but if you ask for it dataframe.loc[:,column], it will use the string that is represented by the input parameter named column.

With dataframe.loc[:,column], you get a Pandas Series, and with dataframe.loc[:,[column]] you get a Pandas DataFrame.
The pandas attribute 'columns', used as dataframe.columns (note the 's' at the end) just returns a list of the names of all columns in your dataframe, probably not what you want here.

TIPS:

Try to name input parameters so that you know what they are.

When developing a function, try setting the input to something static, and iterate the code until you get desired output. E.g.

  input_df = my_df
  column_name = 'some_test_column'
  if input_df.loc[:,column_name].nunique() > 2 and input_df.loc[:,column_name].dtypes == object:
      enc.fit(input_df.loc[:,[column_name]])
      onehot = enc.transform(input_df.loc[:,[column_name]]).toarray()
      input_df.loc[:, enc.categories_] = onehot
  elif input_df.loc[:,column_name].nunique() == 2 and input_df.loc[:,column_name].dtypes == object :
      le.fit_transform(input_df.loc[:,[column_name]])
  else:
      print('Column cannot be transformed')

Look up on how to use SciKit Learn Pipelines, with ColumnTransformer. It will help make the workflow easier (https://scikit-learn.org/stable/modules/compose.html).

A function for onehotencoding and labelencoding in a dataframe

1 Answers1