I wrote this microbenchmark to better understand Go's performance characteristics, so that I would be able to make intelligent choices as to when to use it.
I thought this would be the ideal scenario for Go, from the performance overhead point-of-view:
- no allocations / deallocations inside the loop
- array access clearly within bounds (bounds checks could be removed)
Still, I'm seeing an exactly 4-fold difference in speed relative to gcc -O3
on AMD64. Why is that?
(Timed using the shell. Each takes a few seconds, so the startup is negligible)
package main
import "fmt"
func main() {
fmt.Println("started");
var n int32 = 1024 * 32
a := make([]int32, n, n)
b := make([]int32, n, n)
var it, i, j int32
for i = 0; i < n; i++ {
a[i] = i
b[i] = -i
}
var r int32 = 10
var sum int32 = 0
for it = 0; it < r; it++ {
for i = 0; i < n; i++ {
for j = 0; j < n; j++ {
sum += (a[i] + b[j]) * (it + 1)
}
}
}
fmt.Printf("n = %d, r = %d, sum = %d\n", n, r, sum)
}
The C version:
#include <stdio.h>
#include <stdlib.h>
int main() {
printf("started\n");
int32_t n = 1024 * 32;
int32_t* a = malloc(sizeof(int32_t) * n);
int32_t* b = malloc(sizeof(int32_t) * n);
for(int32_t i = 0; i < n; ++i) {
a[i] = i;
b[i] = -i;
}
int32_t r = 10;
int32_t sum = 0;
for(int32_t it = 0; it < r; ++it) {
for(int32_t i = 0; i < n; ++i) {
for(int32_t j = 0; j < n; ++j) {
sum += (a[i] + b[j]) * (it + 1);
}
}
}
printf("n = %d, r = %d, sum = %d\n", n, r, sum);
free(a);
free(b);
}
Updates:
- Using
range
, as suggested, speeds Go up by a factor of 2. - On the other hand,
-march=native
speeds C up by a factor of 2, in my tests. (And-mno-sse
gives a compile error, apparently incompatible with-O3
) - GCCGO seems comparable to GCC here (and does not need
range
)