I'm writing some performance-critical code in Swift. After implementing all the optimizations I could think of, and profiling the application in Instruments, I came to realize that the vast majority of CPU cycles are spent performing map()
and reduce()
operations on arrays of Floats. So, just to see what would happen, I replaced all instances of map
and reduce
with good old-fashioned for
loops. And to my amazement... the for
loops were much, much faster!
A bit puzzled by this, I decided to perform some rough benchmarks. In one test, I had map
return an array of Floats after performing some simple arithmetic like so:
// Populate array with 1,000,000,000 random numbers
var array = [Float](count: 1_000_000_000, repeatedValue: 0)
for i in 0..<array.count {
array[i] = Float(random())
}
let start = NSDate()
// Construct a new array, with each element from the original multiplied by 5
let output = array.map({ (element) -> Float in
return element * 5
})
// Log the elapsed time
let elapsed = NSDate().timeIntervalSinceDate(start)
print(elapsed)
And the equivalent for
loop implementation:
var output = [Float]()
for element in array {
output.append(element * 5)
}
Average execution time for map
: 20.1 seconds. Average execution time for the for
loop: 11.2 seconds. Results were similar using Integers instead of Floats.
I created a similar benchmark to test the performance of Swift's reduce
. This time, reduce
and for
loops achieved nearly the same performance when summing the elements of one large array. But when I loop the test 100,000 times like this:
// Populate array with 1,000,000 random numbers
var array = [Float](count: 1_000_000, repeatedValue: 0)
for i in 0..<array.count {
array[i] = Float(random())
}
let start = NSDate()
// Perform operation 100,000 times
for _ in 0..<100_000 {
let sum = array.reduce(0, combine: {$0 + $1})
}
// Log the elapsed time
let elapsed = NSDate().timeIntervalSinceDate(start)
print(elapsed)
vs:
for _ in 0..<100_000 {
var sum: Float = 0
for element in array {
sum += element
}
}
The reduce
method takes 29 seconds while the for
loop takes (apparently) 0.000003 seconds.
Naturally I'm ready to disregard that last test as the result of a compiler optimization, but I think it may give some insight into how the compiler optimizes differently for loops vs Swift's built-in array methods. Note that all tests were performed with -Os optimization on a 2.5 GHz i7 MacBook Pro. Results varied depending on array size and number of iterations, but for
loops always outperformed the other methods by at least 1.5x, sometimes up to 10x.
I'm a bit perplexed about Swift's performance here. Shouldn't the built-in Array methods be faster than the naive approach for performing such operations? Maybe somebody with more low-level knowledge than I can shed some light on the situation.