-1

I've got an array of a lot of numbers. These are daily readings over a 4 month period. Each row represents a different device. I need to calculate the trendline and ignore the outliers. I tried calculating the IQR, then the range, and then filtering out the outliers that are not in the range. the problem is that there are a lot of repetitive results so the q1 and the q3 could very easily be the same number. is there a way to select the central 90% of results in a range? lets say from: [1,1,1,1,1,1,1,1,2,3,4,5,7,8,9,9,9,9,9,9] select [1,1,1,1,1,1,1,2,3,4,5,7,8,9,9,9,9,9] to make life harder, I'd also like to be able to select the associated date with the day the measurement was taken. the dates are in a separate row.

  • Not sure if I understood properly but based on your example, if you exclude some numbers of your sample, then you are distorting results. I agree you need to manage outliers, but as I said, your example sample looks good. – Foxfire And Burns And Burns Oct 06 '21 at 08:04

1 Answers1

0

I also would advise caution when omitting outliers. Here is an article providing guidelines for when it's appropriate: https://statisticsbyjim.com/basics/remove-outliers/

That being said, the following formula will return a dynamic array that filters out the bottom 5% and the top 5%. It assumes your data range is A1:A20.

=FILTER($A$1:$A$20,(ROW($A$1:$A$20)>0.05*COUNT($A$1:$A$20))*ROW($A$1:$A$20)<0.95*COUNT($A$1:$A$20)) 
Bryan Rock
  • 582
  • 1
  • 4
  • 13