0

Dataset is below

,store id,revenue ,profit
0,101,779183,281257
1,101,144829,838451
2,101,766465,757565
3,101,353297,261071
4,101,1615461,275760
5,101,246731,949229
6,101,951518,301016
7,101,444669,430583

Code is below

import pandas as pd
import numpy as np
import pylab
from sklearn.preprocessing import StandardScaler
from pylab import rcParams

df = pd.read_csv(r'data.csv',header=None,sep=',')
df.columns = df.columns.str.replace(' ', '')
dummies = pd.get_dummies(data = df)
del dummies['Unnamed:0']
store = dummies[['storeid']]
test = dummies[['profit']]
qv1 = test[param].quantile(0.25)
qv2 = test[param].quantile(0.5)
qv3 = test[param].quantile(0.75)
qv_limit = 1.5 * (qv3 - qv1)
qv_limit,qv3,qv1
#(688855.5, 776026.0, 316789.0)
un_outliers_mask = (test[param] > qv3 + qv_limit) | (test[param] < qv1 - qv_limit)
un_outliers_data = test[param][un_outliers_mask]
un_outliers_name = store[un_outliers_mask]  
un_outliers_data

The output of un_outliers_data is Series([], Name: profit, dtype: int64). There are some points which is outliers like you can see 1615461 > (776026.0 + 688855.5)

  • what do you really want , you want to know how to find IQR in the dataset, in case of that you can look here https://www.khanacademy.org/math/ap-statistics/summarizing-quantitative-data-ap/measuring-spread-quantitative/v/calculating-interquartile-range-iqr – Darkknight Jun 09 '20 at 03:26
  • @Darkknight sample data set url is below https://raw.githubusercontent.com/mak705/Pandas_Projects/master/store.csv –  Jun 09 '20 at 03:39

0 Answers0