0

I came across this interesting Scala problem and not sure how to solve it:

class TopN {
  def findTopN(n: Int)(stream: Stream[Int]): List[Int] = {
   ???
  }
} 

This is a test of abstract data engineering skills.

The function findTopN(...) in TopN is supposed to find the top N highest unique integers in a presumed endless stream of integers. To process the Stream of Int, you can only hold a few values in memory at a given time. Therefore, a memory efficient way to process this list is required.

Edit: Because it's an endless stream I understand the question as what are the top N numbers so far. So you have to maintain in-memory state.

Krzysztof Atłasik
  • 21,985
  • 6
  • 54
  • 76
jakstack
  • 2,143
  • 3
  • 20
  • 37
  • 4
    What would be the definition of "top N elements" without bound? For me it's always top N in for some known elements, without limit ... make no sense – cchantep Jun 28 '19 at 16:22
  • of course you can solve this question with one line of code for a bounded stream but my understanding for unbounded streams is what are the the top N numbers [[[so far]]], make sense? – jakstack Jun 28 '19 at 16:26
  • You always need to set a limit, to known when to stop consuming the streaming and so check at this point what are the top elements – cchantep Jun 28 '19 at 16:27
  • my solution which I'm not sure how to implement, is maintain a sorted list if N values, each int i see I either add to my sorted list or ignore – jakstack Jun 28 '19 at 16:58
  • it also mentions "abstract data engineering" is that same as ADTs? – jakstack Jun 28 '19 at 16:59
  • You still don't answer when to stop, what's the termination, which is a key point to determine thetop-n-so-far – cchantep Jun 28 '19 at 19:09

3 Answers3

2

If you have a function topNSoFar:

def topNSoFar(n: Int)(prev: List[Int], next: Int): List[Int] = ???

Then you can run this on the stream like:

def findTopN(n: Int)(stream: Stream[Int]): Stream[List[Int]] =
  stream.foldLeft(List.empty[Int])(topNSoFar(n))

Then just stop where you want.

Karl Bielefeldt
  • 47,314
  • 10
  • 60
  • 94
2

The fact that you want to keep track of the top N "so far" indicates, to me, that accessing a Stream element is to consume it. In other words, it's going to be easier to treat the Stream as an Interator.

class TopN[A](n: Int, infinite: Stream[A])(implicit ev :Ordering[A]) {

  // reverse priority queue - rpq.head will always be minimum
  private val rpq = collection.mutable.PriorityQueue[A]()(Ordering[A].reverse)
  def sofar :List[A] = rpq.toList.sorted

  // turn infinite Stream to infinite Iterator
  private val itr = infinite.iterator
  def next() :A = {
    val nxt = itr.next()
    if (rpq.size < n) rpq.enqueue(nxt)
    else if (ev.lt(rpq.head, nxt)) {rpq.dequeue();rpq.enqueue(nxt)}
    nxt
  }
  // use next() to implement other methods such as take() or drop()
}
jwvh
  • 50,871
  • 7
  • 38
  • 64
1

As others already mentioned problem you described can't be solved, because you can't just know the top 10 numbers from the endless stream. But if you changed the signature of your function to

def findTopN(n: Int)(stream: Stream[Int]): Stream[List[Int]]

then it would mean change an infinite stream of the random number to an infinite stream of lists of top n random numbers and we could write it like:

def findTopN(n: Int)(stream: Stream[Int]): Stream[List[Int]] = {
    randomStream.scanLeft(List.empty[Int])((list, next) => {
      (next :: list).sorted(Ordering[Int].reverse) match {
        case m if m.size > n => m.init
        case m => m
      }
    })
}

val random = Random

val randomStream = Stream.iterate(random.nextInt(1000))(_ => random.nextInt(1000))

findTopN(5)(randomStream).take(10).toList.foreach(println)

And it would yield something like:

List()
List(868)
List(868, 695)
List(868, 695, 214)
List(868, 695, 453, 214)
List(973, 868, 695, 453, 214)
List(973, 868, 695, 453, 255)
List(973, 868, 695, 453, 271)
List(973, 868, 759, 695, 453)
List(973, 868, 759, 695, 466)

I guess proper technical term for it would be calculating running top N.

Krzysztof Atłasik
  • 21,985
  • 6
  • 54
  • 76