1

ISSUE

I am performing data cleansing. I have calculated a column mean based on conditions fed into the .loc() function. Storing this output in the z variable is producing a (1,1) dataframe and throwing an incompatibility error when I try to assign it to a missing value.

WHAT'S BEEN TRIED

Input

a = train[['LotFrontage', 'MSSubClass', 'MSZoning', 'Street', 
           'LotShape']]

z = train.loc[((train.MSSubClass == 190) & 
             (train.MSZoning == 'RL') & 
             (train.LotShape == 'IR1'))]
             .agg({'LotFrontage': ['mean']})

a.LotFrontage[335] = z

Output

ValueError: Incompatible indexer with DataFrame

QUESTIONS

  1. Is it possible to store the .mean() output as an integer in z to fix this issue?
  2. If above is not possible, is there a different method I should be using to replace the missing LotFrontage value with the calculated mean?
P-Sides
  • 59
  • 9

2 Answers2

2

You can use

z = (train.loc[((train.MSSubClass == 190) & 
               (train.MSZoning == 'RL') & 
               (train.LotShape == 'IR1'))]
     .agg({'LotFrontage': ['mean']})
     .item())  # Return first element of Series
# or
z = (train.loc[((train.MSSubClass == 190) & 
                (train.MSZoning == 'RL') & 
                (train.LotShape == 'IR1'))]
     ['LotFrontage'].mean())


# Depending on what is missing value in `LotFrontage` column
# if it is empty string, you can use `.eq('')`
# if it is NaN value, you can use `.isna()`
m = a['LotFrontage'].isna()
a.loc[m, 'LotFrontage'] = z
Ynjxsjmh
  • 28,441
  • 6
  • 34
  • 52
1

If I understand what you're trying to do correctly... this may work.

z = (train.loc[train.MSSubClass.eq(190) 
               & train.MSZoning.eq('RL') 
               & train.LotShape.eq('IR1'), 'LotFrontage']
          .mean()) # Returns a float.

a.loc[335, 'LotFrontage'] = z

# Or, for all nans in LotFrontage:

a.LotFrontage = a.LotFrontage.fillna(z)
BeRT2me
  • 12,699
  • 2
  • 13
  • 31
  • I haven't seen `.eq()` used before. Looks pretty useful I'll to read more into it. I also completely glossed over being able to call conditions and the column in `.loc()`. Simplified a lot. – P-Sides Jul 23 '22 at 21:55
  • Ye, `eq(), lt() gt(), ne(), ge()` etc. are all useful, and they help reduce the clutter of `()` everywhere :') – BeRT2me Jul 23 '22 at 21:56
  • Do you have a link or term I could search so I could research that list? – P-Sides Jul 23 '22 at 22:23
  • 1
    https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.eq.html – BeRT2me Jul 23 '22 at 22:28