1

I wanna identify the outliers by changing them into another colour from the rest, so that after removing the outliers, the change in scatterplot is clearer.

# TotalBsmtSF: Total square feet of basement area

fig = plt.figure(figsize=(16, 8))

ax1 = fig.add_subplot(211)
b = sns.scatterplot(x = 'TotalBsmtSF', y = 'SalePrice', data = df, ax=ax1,)
plt.title ('Total square feet of basement area VS SalePrice (With Outliers)', fontsize=13)
plt.tight_layout()

# Removing houses with total basement area which is more than 3000 square feet
df = df.drop(df[(df['TotalBsmtSF']>3000) & (df['SalePrice']>=160000)].index)
# print(df['TotalBsmtSF'].head(450))
ax2 = fig.add_subplot(212)
b = sns.scatterplot(x = 'TotalBsmtSF', y = 'SalePrice', data = df, ax=ax2,)
plt.title ('Total square feet of basement area VS SalePrice (Outliers Removed)', fontsize=13)
plt.tight_layout()

plt.close(2)
plt.close(3)
plt.tight_layout()
takahashi
  • 21
  • 1
  • 7

1 Answers1

4

Seaborn allows you to change the color of the markers based on categorical or numerical data. So you could create a new column that defines whether the data point is an outlier or not and then call the hue parameter in seaborn. These would be the lines to add or change in your code

df['outlier'] = np.where(df['TotalBsmtSF']>3000) & (df['SalePrice']>=160000), 'yes', 'no')
b = sns.scatterplot(x = 'TotalBsmtSF', y = 'SalePrice', data = df, ax=ax1, hue="outlier")

I think this should work, but I can't confirm since I don't have data to work with

Novice
  • 855
  • 8
  • 17