Dataset is below
,store id,revenue ,profit
0,101,779183,281257
1,101,144829,838451
2,101,766465,757565
3,101,353297,261071
4,101,1615461,275760
5,101,246731,949229
6,101,951518,301016
7,101,444669,430583
Code is below
import pandas as pd
import numpy as np
import pylab
from sklearn.preprocessing import StandardScaler
from pylab import rcParams
df = pd.read_csv(r'data.csv',header=None,sep=',')
df.columns = df.columns.str.replace(' ', '')
dummies = pd.get_dummies(data = df)
del dummies['Unnamed:0']
store = dummies[['storeid']]
test = dummies[['profit']]
qv1 = test[param].quantile(0.25)
qv2 = test[param].quantile(0.5)
qv3 = test[param].quantile(0.75)
qv_limit = 1.5 * (qv3 - qv1)
qv_limit,qv3,qv1
#(688855.5, 776026.0, 316789.0)
un_outliers_mask = (test[param] > qv3 + qv_limit) | (test[param] < qv1 - qv_limit)
un_outliers_data = test[param][un_outliers_mask]
un_outliers_name = store[un_outliers_mask]
un_outliers_data
The output of un_outliers_data
is Series([], Name: profit, dtype: int64)
. There are some points which is outliers like you can see 1615461
> (776026.0 + 688855.5)