-1

I was trying to fill na values in categoric columns with some string. But error happened. I search everywhere for the solution, but found nothing. please help me


# specify columns with legit na values
legit_na_values_columns = ["MasVnrArea", "MasVnrType", "BsmtExposure", "BsmtCond", "BsmtFinType1", 
                          "BsmtFinType2", "BsmtQual", "BsmtQual", "GarageCond", "GarageQual", 
                          "GarageFinish", "GarageType", "Fireplaces", "Fence", "Alley", "MiscFeature", 
                          "PoolQC"]
num_legit_na = [i for i in df[legit_na_values_columns].columns if df[i].dtype in ["int", "float"]]
cat_legit_na = [i for i in df[legit_na_values_columns].columns if df[i].dtype=="object"]
df_handled = df.copy()
df_handled[cat_legit_na] = df_handled[cat_legit_na].fillna("not_exist")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-9-f772f77eee32> in <module>
      1 # fill categoric columns with legit na value with value "not_exist"
----> 2 df_handled[cat_legit_na] = df_handled[cat_legit_na].fillna("None")

3 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/frame.py in _set_item_frame_value(self, key, value)
   3727             len_cols = 1 if is_scalar(cols) else len(cols)
   3728             if len_cols != len(value.columns):
-> 3729                 raise ValueError("Columns must be same length as key")
   3730 
   3731             # align right-hand-side columns if self.columns

ValueError: Columns must be same length as key

1 Answers1

1

This is most likely due to columns with the same name. In the first list named legit_na_values_columns, I see that the BsmtQual column is passed twice. You can circumvent this problem by dropping or renaming the column:

# drop same name columns
df = df.loc[:, ~df.columns.duplicated()]

# or rename columns
df.columns = ["MasVnrArea", "MasVnrType", "BsmtExposure", "BsmtCond", "BsmtFinType1", 
                          "BsmtFinType2", "BsmtQual_1", "BsmtQual_2", "GarageCond", "GarageQual", 
                          "GarageFinish", "GarageType", "Fireplaces", "Fence", "Alley", "MiscFeature", 
                          "PoolQC"]
 #If the df has columns other than those in the list, please do this: print(df.columns) and add the output to the list, then put a tick 1-2 at the end of the same names as I did above.



Bushmaster
  • 4,196
  • 3
  • 8
  • 28