3

In Kotlin 1.4, I have a Sequence of Sequences. The items in the inner Sequences are already sorted by one of their properties (id).

I want to merge them into one Sequence which would also be sorted by the same property.

It's quite trivial algorithm (always taking the smallest of the next items from all sequences). But I was wondering:

Does Kotlin standard library have a stateless sequence merging operation for pre-sorted Sequences? Something like Sequence<Sequence<T>>.flattenSorted(c: Comparator) or so.

Edit:

As some have correctly assumed from the context, I am not looking for flattenSorted(), which is stateful, does not leverage the pre-sort, and for, say, 100 sequences of 1_000_000 elements, it wouldn't perform too well. I've reworded the question.

Ondra Žižka
  • 43,948
  • 41
  • 217
  • 277
  • [Similar question, for List](https://stackoverflow.com/questions/59000336/kotlin-merge-multiple-lists-then-ordering-interleaved-merge-list) – Ondra Žižka Nov 25 '20 at 10:49
  • "It's quite trivial algorithm" what's your implementation? What if the sequences are `(1, 5, 7)` and `(2, 3, 4)`? – Adam Millerchip Nov 25 '20 at 15:54
  • @AdamMillerchip, not sure what you mean, by "What if the sequecnes are..." My implementation is below, and as you see, the algorithm is really trivial, the non-trivial part is to handle the iterators properly. – Ondra Žižka Nov 27 '20 at 09:54
  • I was thinking that you meant it would do multiple passes, resulting in something like `(1, 2, 3, 5, 4, 7)`, but now I understood that you want to compare all the sequences every time you take an element. – Adam Millerchip Nov 27 '20 at 11:14
  • @AdamMillerchip, not all sequences, just their "next" element. – Ondra Žižka Nov 27 '20 at 20:49
  • Maybe a sequence for the outer layer is not the best choice, because you need to use the sequence unsequentially. – Adam Millerchip Nov 28 '20 at 00:17
  • Sequences are evaluated lazily, so until you terminate them into a target collection, no "pre-sorting" is actually performed on any of sequence items. Then (in case I've got you right) you just need to `sequenceOfSequences.flatten().sortedWith(/* comparator */).toList()` – Nikolai Shevchenko Dec 09 '20 at 11:49
  • @NikolaiShevchenko, the sequences are already sorted. They are results of 3 DB queries sorted by an index. That's what I meant by `I have a Sequence of Sequences of items sorted by ...`. But I have reworded it to be clearer. – Ondra Žižka Dec 09 '20 at 15:16

4 Answers4

2

This is similar to this answer by @Ondra Žižka however this variant is IMO easier to grok and use, because:

  1. The type of the comparator passed in to the method is not affected by the implementation details i.e. comparing Map.Entry<Iterator<T>, T> rather than just the relevant value T.

  2. It uses the sequence builder to yield values into the returned sequence cleanly.

  3. Other minor changes to improve readability.

fun <T> List<Sequence<T>>.mergeSorted(comparator: Comparator<T>): Sequence<T> {
  val iteratorToCurrentValues = map { it.iterator() }
    .filter { it.hasNext() }
    .associateWith { it.next() }
    .toMutableMap()

  val c: Comparator<Map.Entry<Iterator<T>, T>> = Comparator.comparing({ it.value }, comparator)

  return sequence {
    while (iteratorToCurrentValues.isNotEmpty()) {
      val smallestEntry = iteratorToCurrentValues.minWithOrNull(c)!!

      yield(smallestEntry.value)

      if (!smallestEntry.key.hasNext())
        iteratorToCurrentValues.remove(smallestEntry.key)
      else
        iteratorToCurrentValues[smallestEntry.key] = smallestEntry.key.next()
    }
  }
}
Raman
  • 17,606
  • 5
  • 95
  • 112
1

If there's nothing such, I've implemented this, for X Sequences, needing just a Map of the size X, and running in O(n).

fun <T> Sequence<Sequence<T>>.mergeSorted(comparator: Comparator<Map.Entry<Iterator<T>, T>>): Sequence<T> {

    // A map of iterators to their next value.
    val nexts = this
        .map { it.iterator() }
        .filter { it.hasNext() }
        .associateBy({it}, { it.next() }
    ).toMutableMap()
   
    return object : Sequence<T> {
        override fun iterator() = object : Iterator<T> {
            override fun hasNext() = nexts.isNotEmpty()

            override fun next(): T {
                val smallest = nexts.minWithOrNull(comparator)
                if (smallest == null)
                    throw NoSuchElementException("No more items. Did you forget to call hasNext()?")

                val toReturn = smallest.value
                if (!smallest.key.hasNext())   // This source is depleted.
                    nexts.remove(smallest.key)
                else
                    nexts[smallest.key] = smallest.key.next()
                
                return toReturn
            }
        }
    }
}
Ondra Žižka
  • 43,948
  • 41
  • 217
  • 277
0

The algorithm I assume you are looking for is the linear-time merge step of mergesort.

fun <E : Comparable<E>> mergeSeq(a_ : Sequence<E>, b_: Sequence<E>): Sequence<E> {
    val l = mutableListOf<E>()
    var a = a_
    var b = b_
    var a_head = a.firstOrNull()
    var b_head = b.firstOrNull()
    while (a_head != null && b_head != null) {
        if (a_head < b_head) {
            l.add(a_head)
            a = a.drop(1)
            a_head = a.firstOrNull()
        } else {
            l.add(b_head)
            b = b.drop(1)
            b_head = b.firstOrNull()
        }
    }
    return l.asSequence() + a + b
}

fun main() {
    val a = sequenceOf(1, 5, 7)
    val b = sequenceOf(2, 3, 9)
    println(mergeSeq(a, b).toList())
}

I wrote this implementation which does precisely that. Note the line var a = a_ is needed because we function parameters are all val in Kotlin, and we cannot re-assign to val. The object you are passing would also need to implement compareTo so the line a_head < b_head works.

This also only merges 2 sequences, not n of them, for this you just repeatedly apply this function, e.g. as a fold/reduce:

fun <E : Comparable<E>> mergeSeqList(l : List<Sequence<E>>): Sequence<E> =
        l.fold(sequenceOf(), ::mergeSeq)
xjcl
  • 12,848
  • 6
  • 67
  • 89
  • Nice, but this is not sorting a sequence of sequences, but 2 sequences, and applying it repeatedly is not too practical - what if I have 1000 of sequences? – Ondra Žižka Nov 27 '20 at 09:45
  • It has a part at the bottom for a `List>`, which you can easily change to a `Sequence>`. – xjcl Nov 27 '20 at 10:40
  • My solution would merge `m` sequences of length `k` in time `O(m^2 k)` which is worse than if you did it in binary fashion (as mergesort does), which is `O(m log m k)`. But I suggest trying my solution first to see if it handles your use case. – xjcl Nov 27 '20 at 10:43
0

I have no idea if this works but...

What if you first create a sorted set with a comparator and then keep adding all sub-sequences to this set?

Comparator should handle the case when both id's are equal - either remove one item or compare against some other field.

CapSel
  • 48
  • 1
  • 4