Average calculation by eliminating top 10% & bottom 10% values using T-SQL

Question

I need to calculate the average duration of one of our running stored procedures. For example the duration(in seconds) of SP are: 1,30,2,5,2,15,35,7,3,4,2,1,2,40

I have to eliminate top 10% calls (fast calls) & bottom 10% calls (slow calls) & calculate the average on the rest.

Is there a better approach minimizing the performance hit since this has to done on a huge data set in a regular basis?

The approach I'm aware is:

Eliminate top 10% records using following query which results two values (1,1)

SELECT TOP 10 PERCENT WITH TIES value FROM #t order by value asc

bottom 10% gives (35,40),

SELECT TOP 10 PERCENT WITH TIES value FROM #t order by value desc

After eliminating those values(1,1,35 & 40) the average is going to be 7.

By top 10% you mean eliminate 10% of the results by count, not those that fall into the top 10% of fastest performance, right? So if you had 10 results, you'd eliminate the top one, the bottom one (even if there were duplicate values for top or bottom, you only throw away one), and average the rest, correct? Just want to be clear. — pmbAustin, Jun 08 '17 at 22:41
What's wrong with the approach that you're aware of? For example, `SELECT AVG(value * 1.0) FROM #t WHERE value NOT IN (SELECT TOP 10 PERCENT WITH TIES value FROM #t ORDER BY value) AND value NOT IN (SELECT TOP 10 PERCENT WITH TIES value FROM #t ORDER BY value DESC)`? — ZLK, Jun 08 '17 at 23:17
@sqlcheckpoint did either of the answers below solve your problem? — APH, Jun 22 '17 at 22:50

score 1 · Answer 1 · answered May 25 '21 at 01:11

Slightly different approach, but if goal is to cut off outliers, this should work. This theoretically should perform faster than other methods using NTILE or ROW_NUMBER as those have to scan the entire result set and then filter. This would only scan what it needs and should perform well with the right index

DROP TABLE IF EXISTS #tbl_test

CREATE TABLE #tbl_test (val INT)
INSERT INTO #tbl_test 
VALUES (1),(30),(2),(5),(2),(15),(35),(7),(3),(4),(2),(1),(2),(40)

Declare @RowCount INT = (SELECT COUNT(*) FROM #tbl_test)
Declare @TenthOfTableRowCount INT = (Select CEILING(@RowCount/10.0))

;WITH cte_Middle80Percent AS (
    SELECT *
    FROM #tbl_test
    ORDER BY val
    OFFSET (@TenthOfTableRowCount) ROWS 
    FETCH NEXT (@RowCount - @TenthOfTableRowCount*2) ROWS ONLY
)

SELECT AVG(val) AS AvgVal
FROM cte_Middle80Percent

John Cappelletti · Answer 2 · 2017-06-09T00:03:14.300

0

Edited to Ensure an Even "Haircut"

Declare @YourTable table (Seconds int)
Insert Into @YourTable values
(1),(30),(2),(5),(2),(15),(35),(7),(3),(4),(2),(1),(2),(40)

Select AvgSeconds = avg(Seconds)
 From (
        Select *
              ,Dec1 = NTile(10) over (Order By Seconds)
              ,Dec2 = NTile(10) over (Order By Seconds Desc)
          From  @YourTable
      ) A
 Where Dec1 between 2 and 9
   and Dec2 between 2 and 9

Returns

AvgSeconds
7

edited Jun 09 '17 at 00:03

answered Jun 08 '17 at 22:49

John Cappelletti

79,615
7
44
66

While this _seems_ like one of the rare instances where NTILE is actually useful, the problem with NTILE is if the integer (in this case, 10) doesn't divide evenly into the total count (in this case, 14, thus it doesn't divide evenly) you get uneven results (in this case, 2 from the bottom and 1 from the top). – ZLK Jun 08 '17 at 23:10
@ZLK True, but I tend to work with larger samples so any variance would get lost in the dust. I did however make an edit to ensure an even haircut. Appreciate the note. – John Cappelletti Jun 08 '17 at 23:55

Average calculation by eliminating top 10% & bottom 10% values using T-SQL

2 Answers2