1

When I apply the Scala predefined identity function to a collection using the map method the original collection is returned unchanged. However, is the compiler smart enough to simply return the unchanged collection as an O(1) operation? Or will the identity function still be applied to each element in the original collection yielding a O(n) operation?

davidrpugh
  • 4,363
  • 5
  • 32
  • 46

4 Answers4

4

It is pretty straightforward to check that this is not the case. Start by making a test file with possibly optimized forms and compile it using scalac (with or without -optimise)

/// TestMap.scala
object TestMap {
  def mapOption[T](o: Option[T]): Option[T] = o.map(identity)
  def mapList[T](l: List[T]): List[T] = l.map(identity)
  def mapSeq[T](l: Seq[T]): Seq[T] = l.map(identity)
}

Then, checking javap -c TestMap.class you can see that nothing got optimized past specializing map to mapSeq, mapList, or mapOption:

Compiled from "TestMap.scala"
public final class TestMap {
  public static <T extends java/lang/Object> scala.collection.Seq<T> mapSeq(scala.collection.Seq<T>);
    Code:
       0: getstatic     #16                 // Field TestMap$.MODULE$:LTestMap$;
       3: aload_0       
       4: invokevirtual #18                 // Method TestMap$.mapSeq:(Lscala/collection/Seq;)Lscala/collection/Seq;
       7: areturn       

  public static <T extends java/lang/Object> scala.collection.immutable.List<T> mapList(scala.collection.immutable.List<T>);
    Code:
       0: getstatic     #16                 // Field TestMap$.MODULE$:LTestMap$;
       3: aload_0       
       4: invokevirtual #22                 // Method TestMap$.mapList:(Lscala/collection/immutable/List;)Lscala/collection/immutable/List;
       7: areturn       

  public static <T extends java/lang/Object> scala.Option<T> mapOption(scala.Option<T>);
    Code:
       0: getstatic     #16                 // Field TestMap$.MODULE$:LTestMap$;
       3: aload_0       
       4: invokevirtual #26                 // Method TestMap$.mapOption:(Lscala/Option;)Lscala/Option;
       7: areturn  

More simply, this sort of optimization just doesn't extend well in languages with side-effects (on the other hand, in Haskell, this sort of thing happens regularly). Should the compiler optimise l.map(x => { println(x); x }) to l or not for example?

Alec
  • 31,829
  • 7
  • 67
  • 114
2

Rex Kerr's handy Thyme utility confirms Alec's findings. The identity runtime is roughly proportional to the collection size.

val smallC = Vector.tabulate(90)(_*2)
val bigC = Vector.tabulate(900)(_*2)

val th = ichi.bench.Thyme.warmed(verbose = print)
th.pbenchOffWarm("A vs. B")(th.Warm(smallC.map(identity)))(th.Warm(bigC.map(identity)))

Benchmark comparison (in 4.694 s): A vs. B
Significantly different (p ~= 0)
  Time ratio:    9.31267   95% CI 9.25599 - 9.36935   (n=20)
    First     1.492 us   95% CI 1.487 us - 1.496 us
    Second    13.89 us   95% CI 13.82 us - 13.96 us
jwvh
  • 50,871
  • 7
  • 38
  • 64
2

When I apply the Scala predefined identity function to a collection using the map method the original collection is returned unchanged.

No, it isn't. A new collection with identical contents is returned. Constructing this new collection will typically be O(n).

However, is the compiler smart enough to simply return the unchanged collection as an O(1) operation? Or will the identity function still be applied to each element in the original collection yielding a O(n) operation?

In order to perform such an optimization, the compiler will have to decide that the function to be applied is extensionally equal to the identity function. This problem is called the Function Problem and is known to be undecidable. (It can be proven using the Halting Problem, for example.)

Of course, it would be possible to optimize for the specific function Predef.identity, not just any identity function. But the Scala compiler designers are not fond of such one-off special-case optimizations, that only help standard library code. They prefer general optimizations that benefits all code.

Jörg W Mittag
  • 363,080
  • 75
  • 446
  • 653
  • Well, technically, the question only asks about doing this for `Predef.identity`, not for all functions extensionally equal to it, and this could be done quite simply. But slowing down very slightly all `map` operations to speed up a very rare case isn't worthwhile. – Alexey Romanov Sep 18 '16 at 08:16
  • Yeah, you're right. I added a final paragraph with a clarification. – Jörg W Mittag Sep 20 '16 at 11:08
0

Measuring execution time seems to indicate that identity function is O(n):

Function for measuring code execution time, from this link:

def time[R](block: => R): R = {  
    val t0 = System.nanoTime()
    val result = block    // call-by-name
    val t1 = System.nanoTime()
    println("Elapsed time: " + (t1 - t0) + "ns")
    result
}

time {(1 to 100000000).map(identity)}  // Elapsed time: 8893077396ns

time {(1 to 10).map(identity)} // Elapsed time: 341129ns

// while only creation of range takes similar order of magnitude times. 

time {(1 to 10)}  // Elapsed time: 30250ns

time {(1 to 100000000)} // Elapsed time: 32351ns
Samar
  • 2,091
  • 1
  • 15
  • 18