0

I am trying to append my data frame to new data frame but I am getting a 'Argument must be a string or number ' error.

# The encoders
le = LabelEncoder()
ohc = OneHotEncoder()
for col in num_ohc_cols.index:

   # Integer encode the string categories
   dat = le.fit_transform(df_ohc[col]).astype(np.int)
   # Remove the original column from the dataframe
    df_ohc = df_ohc.drop(col,axis=1)
   # One hot encode the data--this returns a sparse array
   new_dat = ohc.fit_transform(dat.reshape(-1,1))
   # Create unique column names

   n_cols = new_dat.shape[1]

   col_names = ['_'.join([col,str(x)]) for x in range(n_cols)]
   print(col_names)
   # Create the new dataframe

I'm getting the error here, in creating new dataframe:

new_df=pd.DataFrame(
 new_dat.toarray(),index=df_ohc.index,columns=col_names)
desertnaut
  • 57,590
  • 26
  • 140
  • 166
teamzealot
  • 61
  • 11

2 Answers2

0

This error is caused because your data actually does have both a number and a string. The best way to fix this would be to convert all data to a string as follows:

new_df = new_df.apply(lambda x: le.fit_transform(x.astype(str)), axis=0, result_type='expand')
tersrth
  • 861
  • 6
  • 18
0

I solved by appending changing my append method as :

df_ohc = pd.concat([df_ohc, new_df], axis=1)

teamzealot
  • 61
  • 11