30

I am creating a boxplot generator in Ruby, and I need to calculate some things.

Let's say I have this array:

arr = [1, 5, 7, 2, 53, 65, 24]

How can I find the lowest value (1), highest value (65), total (157), average (22.43) and median (7) from the above array?

Thanks

  • would recommend changing the line total = arr.inject(:+) to total = arr.inject(0, :+) to avoid getting a nil value – user1283153 Dec 15 '12 at 22:00

2 Answers2

66
lowest = arr.min
highest = arr.max
total = arr.inject(:+)
len = arr.length
average = total.to_f / len # to_f so we don't get an integer result
sorted = arr.sort
median = len % 2 == 1 ? sorted[len/2] : (sorted[len/2 - 1] + sorted[len/2]).to_f / 2
Drew Johnson
  • 18,973
  • 9
  • 32
  • 35
sepp2k
  • 363,768
  • 54
  • 674
  • 675
  • 4
    You need to be a bit more careful with the median, in case `arr.length` is divisible by 2. A method that should always work is `do sortedarr = arr.sort ; medpt1 = arr.length / 2 ; medpt2 = (arr.length+1)/2 ; (sortedarr[medpt1] + sortedarr[medpt2]).to_f / 2 ; end`, but obviously that's more expensive, and not as nice and pretty, as what you have in your answer. – Aidan Cully Jun 03 '10 at 15:59
  • 1
    One minor note: arr.inject(:+) will only work in Ruby 1.8.7 or greater (or if another library has implemented Symbol#to_proc, as Rails' ActiveSupport does). Otherwise, arr.inject {|sum, n| sum + n} would work. – Greg Campbell Jun 03 '10 at 17:19
  • 3
    @GregCampbell: `arr.inject(:+)` does not invoke `Symbol#to_proc`, inject invokes `rb_funcall` directly when given a symbol (which is a lot faster than passing a block (or worse using Symbol#to_proc)). But you're right that it only works in 1.8.7+. – sepp2k Jun 03 '10 at 17:30
  • Since you're sorting the array to find the median, would it be more efficient overall to find the minimum and maximum with sorted.first and sorted.last? – David Aldridge Dec 15 '12 at 22:11
  • @DavidAldridge in absolute terms yes, however the time complexity of the sorting is O(nlogn) and will eat up the O(n) of the min and max operations anyway for large datasets. – Fryie Dec 26 '12 at 18:33
1

Finding the minimum, maximum, sum and average are trivial and can be done easily in linear time as shown by sepp2k's answer above.

Finding the median is less trivial and the naive implementation (sorting, and then taking the middle element) runs in O(nlogn) time.

There are, however, algorithms that find the median in linear time (such as the median-of-5 algorithm). Others work even for any kind of order statistic (say, you want to find the 5th-smallest element). The problem with those is that you would have to implement them yourself, I know of no Ruby implementation.

O(nlogn) is quite fast already, so if you're not planning on working on huge datasets (and if you will need to sort your data anyway), you'll be fine with that.

Fryie
  • 146
  • 2