-2

Is there a good algorithm for detecting outliers in small sets of decimal numbers? The best idea I have come up with so far is a kind of recursive standard deviation based approach, but it seems a bit computationally expensive.

I'm using c++, so any existing functionality in say Boost or other maths helper libraries is welcome in your answers.

Thanks.

technorabble
  • 391
  • 7
  • 16
  • it seems you got the wrong stack* site... are you looking for math...? http://math.stackexchange.com/ – elcuco Dec 04 '13 at 21:04
  • @elcuco I think its on topic for SO, since the op mentioned computational efficiency. – ApproachingDarknessFish Dec 04 '13 at 21:07
  • just how "small" are these sets? 1/5/10 - which one's the outlier? – Marc B Dec 04 '13 at 21:08
  • @ValekHalfHeart while I think that this is a great question... I do think that he will get better answers in a dedicated site with math people. – elcuco Dec 04 '13 at 21:08
  • According to WIKI "There is no rigid mathematical definition of what constitutes an outlier; determining whether or not an observation is an outlier is ultimately a subjective exercise." So you probably need to define criteria and then ask for implementation. – Slava Dec 04 '13 at 21:12
  • 1
    @elcuco http://stats.stackexchange.com/ would be a good site also. – Geobits Dec 04 '13 at 21:12
  • You can do it in O(n) time with an online variance algorithm (http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online_algorithm) and then a second pass to mark outliers. – IdeaHat Dec 04 '13 at 21:13
  • @MadScienceDreams make it an answer!!! – elcuco Dec 04 '13 at 21:24

1 Answers1

1

You can do it in O(n) time with an online variance algorithm (http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online_algorithm) and then a second pass to mark outliers.

IdeaHat
  • 7,641
  • 1
  • 22
  • 53