21

I was looking for an Array equivalent String#split in Ruby Core, and was surprised to find that it did not exist. Is there a more elegant way than the following to split an array into sub-arrays based on a value?

class Array
  def split( split_on=nil )
    inject([[]]) do |a,v|
      a.tap{
        if block_given? ? yield(v) : v==split_on
          a << []
        else
          a.last << v
        end
      }
    end.tap{ |a| a.pop if a.last.empty? }
  end
end

p (1..9 ).to_a.split{ |i| i%3==0 },
  (1..10).to_a.split{ |i| i%3==0 }
#=> [[1, 2], [4, 5], [7, 8]]
#=> [[1, 2], [4, 5], [7, 8], [10]]

Edit: For those interested, the "real-world" problem which sparked this request can be seen in this answer, where I've used @fd's answer below for the implementation.

Community
  • 1
  • 1
Phrogz
  • 296,393
  • 112
  • 651
  • 745
  • Well, in Python you could convert it into a string (values separated by commas or something), split that, and then go back to a list. Dunno if that's an option in Ruby. – Rafe Kettler Jan 26 '11 at 00:24
  • @Rafe It would be, but only if the contents were only strings. Even then, that could hardly be considered elegant. :p – Phrogz Jan 26 '11 at 00:38
  • @Phrogz if they were numbers it'd work fine too. You'd just do `','.join([str(x) for x in list_of_nums])`, then split on whatever, then rejoin and split on commas. Functional, yes, elegant, eh no. – Rafe Kettler Jan 26 '11 at 00:45
  • 1
    @Rafe Perhaps I should also accept answers for most roundabout hack. To/from YAML, anyone? :) – Phrogz Jan 26 '11 at 01:22
  • 2
    FYI: I don't see anything in your solution that requires `self` to be an `Array`. You could pull that method up into `Enumerable`, since you only depend on `self` responding to `inject`. (Incidentally, that also would allow you to get rid of the `to_a` in your two testcases.) – Jörg W Mittag Jan 26 '11 at 11:16
  • @Phrogz I noticed that the solution in your question can generate empty subarrays. Try `[0,1,2,3,3,4].split {|e| e % 3 == 0 }`. I'm assuming this is undesirable, but `',a,b,,c,'.split(',')` gives you empty arrays; and passing -1 as the 2nd arg gives you a trailing empty array. – Kelvin Jan 31 '12 at 23:33
  • possible duplicate of [Best way to split arrays into multiple small arrays in ruby](http://stackoverflow.com/questions/5686493/best-way-to-split-arrays-into-multiple-small-arrays-in-ruby). btw best solution is to use `#group_by`. – akostadinov Jul 02 '15 at 12:14
  • @akostadinov That question groups 'similar' values together. This question preserves original array ordering, simply breaking the values apart at some boundary, and discarding that value. – Phrogz Jul 02 '15 at 14:34
  • @Phrogz, if you look at Tapio Saarinen's answer, you'll get the real good answer. You could upvote as it is the best answer. – akostadinov Jul 03 '15 at 17:46
  • @akostadinov No, you still do not understand the difference between this question and that one. Tapio's answer does not answer the need from my question. Look at my sample input and output again. – Phrogz Jul 04 '15 at 21:06

5 Answers5

18

Sometimes partition is a good way to do things like that:

(1..6).partition { |v| v.even? } 
#=> [[2, 4, 6], [1, 3, 5]]
scaryguy
  • 7,720
  • 3
  • 36
  • 52
13

I tried golfing it a bit, still not a single method though:

(1..9).chunk{|i|i%3==0}.reject{|sep,ans| sep}.map{|sep,ans| ans}

Or faster:

(1..9).chunk{|i|i%3==0 || nil}.map{|sep,ans| sep&&ans}.compact

Also, Enumerable#chunk seems to be Ruby 1.9+, but it is very close to what you want.

For example, the raw output would be:

(1..9).chunk{ |i|i%3==0 }.to_a                                       
=> [[false, [1, 2]], [true, [3]], [false, [4, 5]], [true, [6]], [false, [7, 8]], [true, [9]]]

(The to_a is to make irb print something nice, since chunk gives you an enumerator rather than an Array)


Edit: Note that the above elegant solutions are 2-3x slower than the fastest implementation:

module Enumerable
  def split_by
    result = [a=[]]
    each{ |o| yield(o) ? (result << a=[]) : (a << o) }
    result.pop if a.empty?
    result
  end
end
Phrogz
  • 296,393
  • 112
  • 651
  • 745
Mike Tunnicliffe
  • 10,674
  • 3
  • 31
  • 46
  • Nice! I hadn't seen `chunk` before. For the record, it's 1.9.2+, but that's wholly acceptable to me. – Phrogz Jan 26 '11 at 02:10
  • 2
    Here's a link to the doc: http://ruby-doc.org/core/classes/Enumerable.html#M001523 – Mike Tunnicliffe Jan 26 '11 at 02:30
  • Not surprisingly (due to the extra iterations needed for reject/map) chunk is a good bit slower; I've added a benchmarking 'answer' collecting implementations. – Phrogz Jan 26 '11 at 04:48
  • `(1..10).chunk{|n| n % 3 == 0 ? :_separator : :keep}.map{|_,v| v}` – SwiftMango Jun 26 '12 at 17:11
  • `(1..10).chuck{|n| n%3==0 || nil}.map{|_,v| v}` – Mike Tunnicliffe Aug 21 '12 at 12:27
  • But... the "fastest implementation" is incorrect: give it `(3..9)` and you'll get a leading `[]`. Give it `[3, 6, 9]` as input and it'll give you back `[[], []]`. `result.pop if a.empty?` is wrong/useless: you have to `result.reject!(&:empty?)`, requiring a second traversal and achieving the similar effect as with `compact`, possibly putting you back in the 2x/3x factor range. – Lloeki Feb 27 '14 at 16:27
5

Here are benchmarks aggregating the answers (I'll not be accepting this answer):

require 'benchmark'
a = *(1..5000); N = 1000
Benchmark.bmbm do |x|
  %w[ split_with_inject split_with_inject_no_tap split_with_each
      split_with_chunk split_with_chunk2 split_with_chunk3 ].each do |method|
    x.report( method ){ N.times{ a.send(method){ |i| i%3==0 || i%5==0 } } }
  end
end
#=>                                user     system      total        real
#=> split_with_inject          1.857000   0.015000   1.872000 (  1.879188)
#=> split_with_inject_no_tap   1.357000   0.000000   1.357000 (  1.353135)
#=> split_with_each            1.123000   0.000000   1.123000 (  1.123113)
#=> split_with_chunk           3.962000   0.000000   3.962000 (  3.984398)
#=> split_with_chunk2          3.682000   0.000000   3.682000 (  3.687369)
#=> split_with_chunk3          2.278000   0.000000   2.278000 (  2.281228)

The implementations being tested (on Ruby 1.9.2):

class Array
  def split_with_inject
    inject([[]]) do |a,v|
      a.tap{ yield(v) ? (a << []) : (a.last << v) }
    end.tap{ |a| a.pop if a.last.empty? }
  end

  def split_with_inject_no_tap
    result = inject([[]]) do |a,v|
      yield(v) ? (a << []) : (a.last << v)
      a
    end
    result.pop if result.last.empty?
    result
  end

  def split_with_each
    result = [a=[]]
    each{ |o| yield(o) ? (result << a=[]) : (a << o) }
    result.pop if a.empty?
    result
  end

  def split_with_chunk
    chunk{ |o| !!yield(o) }.reject{ |b,a| b }.map{ |b,a| a }
  end

  def split_with_chunk2
    chunk{ |o| !!yield(o) }.map{ |b,a| b ? nil : a }.compact
  end

  def split_with_chunk3
    chunk{ |o| yield(o) || nil }.map{ |b,a| b && a }.compact
  end
end
Phrogz
  • 296,393
  • 112
  • 651
  • 745
  • A bit late too the party, but: these methods aren't entirely comparable, because the results of these methods aren't all the same. The first three return something similar to what `String#split` returns (including empty arrays when two subsequent separators are found), while `split_with_chunk` and `split_with_chunk2` never return empty arrays and while `split_with_chunk3` still contains the 'grouping' value of chunk. – Confusion Sep 01 '13 at 09:45
1

here is another one (with a benchmark comparing it to the fastest split_with_each here https://stackoverflow.com/a/4801483/410102):

require 'benchmark'

class Array
  def split_with_each
    result = [a=[]]
    each{ |o| yield(o) ? (result << a=[]) : (a << o) }
    result.pop if a.empty?
    result
  end

  def split_with_each_2
    u, v = [], []
    each{ |x| (yield x) ? (u << x) : (v << x) }
    [u, v]
  end
end

a = *(1..5000); N = 1000
Benchmark.bmbm do |x|
  %w[ split_with_each split_with_each_2 ].each do |method|
    x.report( method ){ N.times{ a.send(method){ |i| i%3==0 || i%5==0 } } }
  end
end

                        user     system      total        real
split_with_each     2.730000   0.000000   2.730000 (  2.742135)
split_with_each_2   2.270000   0.040000   2.310000 (  2.309600)
Community
  • 1
  • 1
akonsu
  • 28,824
  • 33
  • 119
  • 194
1

Other Enumerable methods you might want to consider is each_slice or each_cons

I don't know how general you want it to be, here's one way

>> (1..9).each_slice(3) {|a| p a.size>1?a[0..-2]:a}
[1, 2]
[4, 5]
[7, 8]
=> nil
>> (1..10).each_slice(3) {|a| p a.size>1?a[0..-2]:a}
[1, 2]
[4, 5]
[7, 8]
[10]
kurumi
  • 25,121
  • 5
  • 44
  • 52