12

I have a collection of objects that I want to break up into a collection of collections, where each sequential group of 3 elements is in one collection. For example, if I have

def l = [1,4,2,4,5,9]

I want to turn this into:

def r = [[1,4,2], [4,5,9]]

I'm doing it now by iterating over the collection and breaking it up.. but I then need to pass those 'groups' into a parallelized function that processes them.. It would be nice to eliminate this O(n) pre-processing work and just say something like

l.slice(3).collectParallel { subC -> process(subC) }

I've found the step method on the Range class, but it looks like that only acts on the indices. Any clever ideas?

Update: I don't think this is a duplicate of the referenced link, although it's very close. As suggested below, it's more of the iterator-type thing I'm looking for.. the sub-collections will then be passed into a GPars collectParallel. Ideally I wouldn't need to allocate an entire new collection.

Bobby
  • 1,666
  • 3
  • 16
  • 27
  • 1
    possible duplicate of [Groovy built-in to split an array into equal sized subarrays?](http://stackoverflow.com/questions/2924395/groovy-built-in-to-split-an-array-into-equal-sized-subarrays) – Michael Easter May 03 '11 at 18:24
  • I agree that this isn't an exact duplicate because of the lazy nature of what you're looking for. – Ted Naleid May 04 '11 at 02:34
  • I wouldn't call this a slice, but rather a collation. I thought a slice was something more like this: http://www.webquills.net/web-development/perl/perl-5-hash-slices-can-replace.html – sf_jeff Sep 11 '17 at 17:20

4 Answers4

24

Check out groovy 1.8.6. There is a new collate method on List.

def list = [1, 2, 3, 4]
assert list.collate(4) == [[1, 2, 3, 4]] // gets you everything   
assert list.collate(2) == [[1, 2], [3, 4]] //splits evenly
assert list.collate(3) == [[1, 2, 3], [4]] // won't split evenly, remainder in last list.

Take a look at the Groovy List documentation for more info because there are a couple of other params that give you some other options, including dropping the remainder.

As far as your parallel processing goes, you can cruise through the lists with gpars.

def list = [1, 2, 3, 4, 5]
GParsPool.withPool {
  list.collate(2).eachParallel {
     println it
  }
}
benkiefer
  • 739
  • 6
  • 10
2

If I understand you correctly, you're currently copying the elements from the original collection into the sub-collections. For more suggestions along those lines, checkout the answers to the following question: Split collection into sub collections in Groovy

It sounds like what you're instead looking for is a way for the sub-collections to effectively be a view into the original collection. If that's the case, check out the List.subList() method. You could either loop over the indices from 0 to size() in increments of 3 (or whatever slice size you choose) or you could get fancier and build an Iterable/List which would hide the details from the caller. Here's an implementation of the latter, inspired by Ted's answer.

class Slicer implements Iterator {
  private List backingList
  private int sliceSize
  private int index

  Slicer(List backingList, int sliceSize) {
    this.backingList = backingList
    this.sliceSize = sliceSize
  }

  Object next() {
    if (!hasNext()) {
      throw new NoSuchElementException()
    }

    def ret
    if (index + sliceSize <= backingList.size()) {
      ret = backingList.subList(index, index+sliceSize)
    } else if (hasNext()) {
      ret = backingList.subList(index, backingList.size())
    }
    index += sliceSize
    return ret
  }

  boolean hasNext() {
    return index < backingList.size()
  }

  void remove() {
    throw new UnsupportedOperationException() //I'm lazy ;)
  }
}
Community
  • 1
  • 1
Matt Passell
  • 4,549
  • 3
  • 24
  • 39
  • Thanks for the inf- - you're right, it's more of the 'view' that I want to create.. there's no need to re-allocate any new objects. The approach in Michael's link above works just fine.. just as my existing code does. This 'Iterable' implementation sounds about right - wondering how much code it would take to do - need to define a new implementation of Iterable? Subclass Iterator? – Bobby May 03 '11 at 19:01
  • I think you might have an off-by-one error in your implementation of hasNext(). Try with lists with 0, 1, 2, 3, 4, 5 elements and a sliceSize of 4. – jabley Jul 08 '11 at 14:28
  • @jabley You're right! I'm not sure what led me to put that in there. I'll go remove it. – Matt Passell Jul 09 '11 at 17:04
1

I like both solutions but here is a slightly improved version of the first solution that I like very much:

class Slicer implements Iterator {
private List backingList
private int sliceSize
private int index

Slicer(List backingList, int sliceSize) {
  this.backingList = backingList;
  int ss = sliceSize;

  // negitive sliceSize = -N means, split the list into N equal (or near equal) pieces  
  if( sliceSize < 0) {
      ss = -sliceSize;
      ss = (int)((backingList.size()+ss-1)/ss);
  }
  this.sliceSize = ss
}

Object next() {
  if (!hasNext()) {
    throw new NoSuchElementException()
  }

  def ret = backingList.subList(index, Math.min(index+sliceSize , backingList.size()) );
  index += sliceSize
  return ret
  }

  boolean hasNext() {
    return index < backingList.size() - 1
  }

  void remove() {
    throw new UnsupportedOperationException() //I'm lazy ;)
  }

  List asList() {
    this.collect { new ArrayList(it) }
  }

  List flatten() {
    backingList.asImmutable()
  }

}

// ======== TESTS

    def a = [1,2,3,4,5,6,7,8];
    assert  [1,2,3,4,5,6,7,8] == a;
    assert [[1, 2], [3, 4], [5, 6], [7, 8]] ==  new Slicer(a,2).asList(); 
    assert [[1,2,3], [4,5,6], [7,8]] == (new Slicer(a,3)).collect { it } // alternative to asList but inner items are subList
    assert [3, 2, 1, 6, 5, 4, 8, 7] == ((new Slicer(a,3)).collect { it.reverse() } ).flatten()

    // show flatten iterator
    //new Slicer(a,2).flattenEach { print it }
    //println ""

    // negetive slice into N pieces, in this example we split it into 2 pieces
    assert [[1, 2, 3, 4], [5, 6, 7, 8]] ==  new Slicer(a,-2).collect { it as List }  // same asList
    assert [[1, 2, 3], [4, 5, 6], [7, 8]] == new Slicer(a,-3).asList()
    //assert a == (new Slicer(a,3)).flattenCollect { it } 
    assert [9..10, 19..20, 29..30] == ( (new Slicer(1..30,2)).findAll { slice -> !(slice[1] % 10) } )
    assert [[9, 10], [19, 20], [29, 30]] == ( (new Slicer(1..30,2)).findAll { slice -> !(slice[1] % 10) }.collect { it.flatten() } )

    println( (new Slicer(1..30,2)).findAll { slice -> !(slice[1] % 10) } )
    println( (new Slicer(1..30,2)).findAll { slice -> !(slice[1] % 10) }.collect { it.flatten() } )
0

There isn't anything built in to do exactly what you want, but if we @Delegate calls to the native lists's iterator, we can write our own class that works just like an Iterator that returns the chunks you're looking for:

class Slicer {
    protected Integer sliceSize 
    @Delegate Iterator iterator

    Slicer(objectWithIterator, Integer sliceSize) {
        this.iterator = objectWithIterator.iterator()
        this.sliceSize = sliceSize
    }

    Object next() {
        List currentSlice = []
        while(hasNext() && currentSlice.size() < sliceSize) {
            currentSlice << this.iterator.next()
        }
        return currentSlice
    }
}

assert [[1,4,2], [4,5,9]] == new Slicer([1,4,2,4,5,9], 3).collect { it }

Because it has all of the methods that a normal Iterator does, you get the groovy syntactic sugar methods for free with lazy evaluation on anything that has an iterator() method, like a range:

assert [5,6] == new Slicer(1..100, 2).find { slice -> slice.first() == 5 }

assert [[9, 10], [19, 20], [29, 30]] == new Slicer(1..30, 2).findAll { slice -> !(slice[1] % 10) }
Ted Naleid
  • 26,511
  • 10
  • 70
  • 81
  • I like your solution, but it still feels like making copies rather than having sub-lists that are views of the original list. I'm trying to figure out if there's some way to combine your approach with List.subList(). – Matt Passell May 04 '11 at 15:15
  • I might not be understanding your comment, but the code above isn't making a copy, it's creating a lazy list iterator that's delegating to the original lists native iterator. No list copying is happening. I'm not sure how you could create a lazy version of your list that something like collectParallel could use. Lots of non-lazy versions (with something like inject, but not lazy. If you come up with something, I'd like to see it :). – Ted Naleid May 04 '11 at 23:33
  • My thoughts finally coalesced. Your solution is perfect when all you know about the backing object is that it's an Iterable. However, if you know that it's a RandomAccessList (such as an ArrayList), you could instead maintain a running index in Slicer and have your next methods return `backingList.subList(index, index+sliceSize)`. It wouldn't make much of a difference when sliceSize is 3, but for 1000, you'd avoid allocating space for each slice and the time spent adding the items into the slice. Make more sense? – Matt Passell May 05 '11 at 01:36
  • Yep, I get what you're saying now. You're not talking about worrying that there's a full copy of the list, but a copy of the subset items that are in the slice section. You're asking for an optimization that just returns one item at a time to the caller in the view. You'd need to return a proxy object that also delegates the getAt to the original list. – Ted Naleid May 05 '11 at 02:02
  • That's basically what I was getting at. See my edited answer for a concrete example. – Matt Passell May 05 '11 at 03:33