1

I am new to sequences, so I may have done something (more or less) terribly wrong, but I have a questiion:

I have written two functions:

fun isPrimeNumber1(number: Int): Boolean {
    if (number <= 1) return false
    for (divider in 2 .. number / 2) {
        if ( number % divider == 0 ) return false
    }
    return true
}

and

fun isPrimeNumber2(number: Int): Boolean {
    if (number <= 1) return false
    return !(2 .. number / 2).asSequence().map { it }.any { number % it == 0 }
}

Now I am running a test written in kotest, in which both functions recieve Int.MAX_VALUE as number.

class MyTestPrime : FunSpec({
    context("Prime numbers") {
        test("Should return true if number is prime (1st fun)") {
            isPrimeNumber1(Int.MAX_VALUE) shouldBe true
        }
        test("Should return true if number is prime (2nd fun)") {
            isPrimeNumber2(Int.MAX_VALUE) shouldBe true
        }
    }
})

Execution time of the isPrimeNumber1() function is around 3,5 seconds, and for the 2nd function isPrimeNumber2() - around 8,5 seconds.

Why is that so? Am I missing something about sequences? Or maybe my code does its purpose in a correct but very unoptimal way?

hc0re
  • 1,806
  • 2
  • 26
  • 61
  • Show the test, so that we can discount the possibility that the difference is not derived from the way in which you ran the test (i.e. JVM cold start vs warm start) – Michael Oct 18 '20 at 12:37
  • As the @Michael said, use https://github.com/openjdk/jmh to do some benchmarks, one unit test tells nothing about the performance of code – szymon_prz Oct 18 '20 at 12:41
  • 2
    Shouldn't you sequence-version be the other way around, i.e. return true if `all` are `!= 0` instead of `any` is `== 0`? (Edit: Did not see the `!`) Also, that `map` does not seem to do anything useful. – tobias_k Oct 18 '20 at 12:43
  • @Michael I have added the test cases – hc0re Oct 18 '20 at 12:45
  • Also, as with Java streams, I'd not expect the sequence-version to be any faster in the first place, e.g. it has to call a function in each step instead of just executing a statement. – tobias_k Oct 18 '20 at 12:46
  • 1
    @Michael - corrected the typo. The function using sequence is the number 2 – hc0re Oct 18 '20 at 12:51
  • In this scenario you shouldn't use a sequence in the first place. Best performance comes with best readability: `... return (2..number/2).none { number % it == 0}` – Frank Neblung Oct 26 '20 at 09:08

2 Answers2

6

It is expected. The variant with sequence creates an Iterator object, and calls .hasNext() and .next() functions for each element.

Since an Iterator works with objects and not primitives, all your ints are boxed via Integer::valueOf calls. (Note: .map { it } step is redundant).

I ran both functions via Java Flight Recorder in IntelliJ Idea, and we can see that the sequence variant causes many more function calls compared to the other variant.

isPrimeNumber1:

isPrimeNumber1

isPrimeNumber2:

isPrimeNumber2

As you can see, the isPrimeNumber2 variant causes many more functions to be called under the hood, and therefore is impacted by their overhead.

Another way of inspecting it is to decompile both functions' bytecode to Java. It can give you better insight on what is going on under the hood. Here's both of the functions decompiled (again using IntelliJ):

private static final boolean isPrimeNumber1(int number) {
  if (number <= 1) {
    return false;
  } else {
    int divider = 2;
    int var2 = number / 2;
    if (divider <= var2) {
      while (true) {
        if (number % divider == 0) {
          return false;
        }

        if (divider == var2) {
          break;
        }

        ++divider;
      }
    }

    return true;
  }
}

private static final boolean isPrimeNumber2(int number) {
  if (number <= 1) {
    return false;
  } else {
    byte var1 = 2;
    Sequence $this$any$iv =
        SequencesKt.map(
            CollectionsKt.asSequence((Iterable) (new IntRange(var1, number / 2))),
            (Function1) null.INSTANCE);
    int $i$f$any = false;
    Iterator var3 = $this$any$iv.iterator();

    boolean var10000;
    while (true) {
      if (var3.hasNext()) {
        Object element$iv = var3.next();
        int it = ((Number) element$iv).intValue();
        int var6 = false;
        if (number % it != 0) {
          continue;
        }

        var10000 = true;
        break;
      }

      var10000 = false;
      break;
    }

    return !var10000;
  } 
}

Final Notes: As others have mentioned, to get meaningful performance measurements, you need to use a tool like jmh. However, as a rule of thumb, simpler language constructs, such as a regular for/while loop over a sequence, tend to have less overhead due to the lower level of abstraction they offer.

Utku Özdemir
  • 7,390
  • 2
  • 52
  • 49
  • At my system with kotlin-1.4.10 the following construct performs even better (2650ms) than the for-loop (2793ms). note: contains no sequence at all. `... return (2..number/2).none { number % it == 0}` – Frank Neblung Oct 26 '20 at 08:58
3

Utku's answer covers the why (yeah, the sequence code is much less optimal here) but generally speaking - Sequences are meant to be a complement to Iterables. They use the same functions, and you can chain them into processing pipelines.

The difference is that sequences are executed lazily, each item passing through the full pipeline before the next is handled, and items are only processed when they need to be.

For example, check this extremely good and plausible code:

import kotlin.system.measureTimeMillis

fun isMagicNumber(num: Double) = num == 10000.0

fun main(args: Array<String>) {
    val numberStrings = (1..25000).map(Int::toString)

    val itertime = measureTimeMillis {
        numberStrings
            .map(String::toInt).map { it * 2.0 }.first(::isMagicNumber)
    }
    println("Iterable version: $itertime ms")
    
    val seqtime = measureTimeMillis {
        numberStrings.asSequence()
            .map(String::toInt).map { it * 2.0 }.first(::isMagicNumber)
    }
    print("Sequence version: $seqtime ms")
}

Starting with a list of 25,000 numbers, as Strings, the pipeline is

  • map to an Int
  • double it (converting to a Double)
  • get the first one that meets the condition (equals 10,000, in this case)

The Iterable version does each step for the entire list, before moving onto the next step:

List<String> -> List<Int> -> List<Double> -> find element

It creates a new list each time (taking up memory), and it doesn't even start checking elements until it's made three lists and can start going over the last one - that's the earliest it can return a result.

Sequences do this:

String -> Int -> Double -> check

for each item. As soon as it hits the item that meets the check condition, it's done. That's gotta be way faster right!!!

Iterable version: 41 ms
Sequence version: 111 ms

Ah! Well. Nevertheless,

Turns out sequences have overhead (which is why a really basic for loop will obliterate them if you can write one), and also computers are pretty darn good at iterating over things - there are lots of optimisations under the hood, creating new arrays and iterating over them can be faster than using linked lists, etc. Which is why you really need to profile what you're doing, if efficiency is what you're going for.


What if the source list (and therefore all the additional lists the Iterable version creates) was 10x bigger, 250,000 items?

Iterable version: 260 ms
Sequence version: 155 ms

Oh hello, now we're getting somewhere. Turns out all that overhead starts to pile up after a while, and being able to exit early becomes important, and sequences start to get more efficient.

And this is just measuring time - you need to look at memory use too. Building huge lists can quickly use a lot of memory, or even become unrunnable (if you're doing Project Euler-levels of "make this work for a bajiliion kazillion items"). Sequences and their one-thing-at-a-time approach can keep the whole thing workable, and completable before the heat death of the universe

Also sequences can be infinite! Which limits where you can use them, or how efficient some operations are (last() needs to run through the entire sequence), but it can also be used in situations where a fixed-size collection won't work.


So yeah, use the right tool for the job, and make sure that if efficiency is important, you're actually using the version that gives the best results. But sometimes readability and composability are more important, and being able to chain operations to do a thing is better than a mean and lean for loop

cactustictacs
  • 17,935
  • 2
  • 14
  • 25