10

Rewriting a simple program from C# to Go, I found the resulting executable 3 to 4 times slower. Expecialy the Go version use 3 to 4 times more CPU. It's surprising because the code does many I/O and is not supposed to consume significant amount of CPU.

I made a very simple version only doing sequential writes, and made benchmarks. I ran the same benchmarks on Windows 10 and Linux (Debian Jessie). The time can't be compared (not the same systems, disks, ...) but the result is interesting.

I'm using the same Go version on both platforms : 1.6

On Windows os.File.Write use cgo (see runtime.cgocall below), not on Linux. Why ?

Here is the disk.go program :

    package main

    import (
        "crypto/rand"
        "fmt"
        "os"
        "time"
    )

    const (
        // size of the test file
        fullSize = 268435456
        // size of read/write per call
        partSize = 128
        // path of temporary test file
        filePath = "./bigfile.tmp"
    )

    func main() {
        buffer := make([]byte, partSize)

        seqWrite := func() error {
            return sequentialWrite(filePath, fullSize, buffer)
        }

        err := fillBuffer(buffer)
        panicIfError(err)
        duration, err := durationOf(seqWrite)
        panicIfError(err)
        fmt.Printf("Duration : %v\n", duration)
    }

    // It's just a test ;)
    func panicIfError(err error) {
        if err != nil {
            panic(err)
        }
    }

    func durationOf(f func() error) (time.Duration, error) {
        startTime := time.Now()
        err := f()
        return time.Since(startTime), err
    }

    func fillBuffer(buffer []byte) error {
        _, err := rand.Read(buffer)
        return err
    }

    func sequentialWrite(filePath string, fullSize int, buffer []byte) error {
        desc, err := os.OpenFile(filePath, os.O_WRONLY|os.O_CREATE, 0666)
        if err != nil {
            return err
        }
        defer func() {
            desc.Close()
            err := os.Remove(filePath)
            panicIfError(err)
        }()

        var totalWrote int
        for totalWrote < fullSize {
            wrote, err := desc.Write(buffer)
            totalWrote += wrote
            if err != nil {
                return err
            }
        }

        return nil
    }

The benchmark test (disk_test.go) :

    package main

    import (
        "testing"
    )

    // go test -bench SequentialWrite -cpuprofile=cpu.out
    // Windows : go tool pprof -text -nodecount=10 ./disk.test.exe cpu.out
    // Linux : go tool pprof -text -nodecount=10 ./disk.test cpu.out
    func BenchmarkSequentialWrite(t *testing.B) {
        buffer := make([]byte, partSize)
        err := sequentialWrite(filePath, fullSize, buffer)
        panicIfError(err)
    }

The Windows result (with cgo) :

    11.68s of 11.95s total (97.74%)
    Dropped 18 nodes (cum <= 0.06s)
    Showing top 10 nodes out of 26 (cum >= 0.09s)
          flat  flat%   sum%        cum   cum%
        11.08s 92.72% 92.72%     11.20s 93.72%  runtime.cgocall
         0.11s  0.92% 93.64%      0.11s  0.92%  runtime.deferreturn
         0.09s  0.75% 94.39%     11.45s 95.82%  os.(*File).write
         0.08s  0.67% 95.06%      0.16s  1.34%  runtime.deferproc.func1
         0.07s  0.59% 95.65%      0.07s  0.59%  runtime.newdefer
         0.06s   0.5% 96.15%      0.28s  2.34%  runtime.systemstack
         0.06s   0.5% 96.65%     11.25s 94.14%  syscall.Write
         0.05s  0.42% 97.07%      0.07s  0.59%  runtime.deferproc
         0.04s  0.33% 97.41%     11.49s 96.15%  os.(*File).Write
         0.04s  0.33% 97.74%      0.09s  0.75%  syscall.(*LazyProc).Find

The Linux result (without cgo) :

    5.04s of 5.10s total (98.82%)
    Dropped 5 nodes (cum <= 0.03s)
    Showing top 10 nodes out of 19 (cum >= 0.06s)
          flat  flat%   sum%        cum   cum%
         4.62s 90.59% 90.59%      4.87s 95.49%  syscall.Syscall
         0.09s  1.76% 92.35%      0.09s  1.76%  runtime/internal/atomic.Cas
         0.08s  1.57% 93.92%      0.19s  3.73%  runtime.exitsyscall
         0.06s  1.18% 95.10%      4.98s 97.65%  os.(*File).write
         0.04s  0.78% 95.88%      5.10s   100%  _/home/sam/Provisoire/go-disk.sequentialWrite
         0.04s  0.78% 96.67%      5.05s 99.02%  os.(*File).Write
         0.04s  0.78% 97.45%      0.04s  0.78%  runtime.memclr
         0.03s  0.59% 98.04%      0.08s  1.57%  runtime.exitsyscallfast
         0.02s  0.39% 98.43%      0.03s  0.59%  os.epipecheck
         0.02s  0.39% 98.82%      0.06s  1.18%  runtime.casgstatus
samonzeweb
  • 141
  • 6
  • 1
    I recall reading somewhere that Windows OSes don't have a syscall interface like unix systems do and instead expose a C API. Not sure how true that is though. – Ainar-G Mar 04 '16 at 15:36
  • 1
    I could be wrong, but looking at https://github.com/golang/go/blob/master/src/syscall/zsyscall_windows.go, it seems like all syscalls go through cgo, but I'm not really familiar with windows. this question would fit better on the golangnuts ML. – OneOfOne Mar 04 '16 at 15:38
  • 1
    In fact Windows *do* have syscalls. The problem is that contrary to certain other operating system kernels (Linux included) which have stable syscall number tables, which only get extended by adding new syscalls, Windows never published these numbers, and they *do* differ between different versions of the kernels of this family of OS. And they might even legitimately be different between, say, different service packs. Since the only documented way to access these syscalls is via DLLs, that's what Go is supposedly doing. – kostix Mar 04 '16 at 16:40
  • Note that what you're observing is one of the reasons for the existence of the `bufio` package: syscalls are expensive in any OS which has and privilege process separation as the execution context has to be "gated" from the user space to the kernel space and back. If you'd use `io.Copy()` which uses 32KiB buffer IIRC, the results would be substantially better I think. – kostix Mar 04 '16 at 16:44
  • @OneOfOne: "I'm not really familiar with windows. this question would fit better on the golangnuts." It's not necessary to ask the question on golang-nuts. There are many Windows experts on Stack Overflow. – peterSO Mar 04 '16 at 17:05
  • The purpose of the test is not to use buffered I/O, but act in the same way as the C# code. But it's an interesting point. In fact the performance difference between C# version en Go version is due to .... C# FileStream buffering ! I know a bit about C#//NET, but I'm not a specialist, and didn't though about that a first time. I made some research and found that System.IO.FileStream use buffered I/O in some cases. Then we can't compare the two versions. But the main purpose of the question is about cgo, and a perterSO made a reply below. Thanks. – samonzeweb Mar 04 '16 at 17:19

1 Answers1

6

Go does not perform file I/O, it delegates the task to the operating system. See the Go operating system dependent syscall packages.

Linux and Windows are different operating systems with different OS ABIs. For example, Linux uses syscalls via syscall.Syscall and Windows uses Windows dlls. On Windows, the dll call is a C call. It doesn't use cgo. It does go through the same dynamic C pointer check used by cgo, runtime.cgocall. There is no runtime.wincall alias.

In summary, different operating systems have different OS call mechanisms.

Command cgo

Passing pointers

Go is a garbage collected language, and the garbage collector needs to know the location of every pointer to Go memory. Because of this, there are restrictions on passing pointers between Go and C.

In this section the term Go pointer means a pointer to memory allocated by Go (such as by using the & operator or calling the predefined new function) and the term C pointer means a pointer to memory allocated by C (such as by a call to C.malloc). Whether a pointer is a Go pointer or a C pointer is a dynamic property determined by how the memory was allocated; it has nothing to do with the type of the pointer.

Go code may pass a Go pointer to C provided the Go memory to which it points does not contain any Go pointers. The C code must preserve this property: it must not store any Go pointers in Go memory, even temporarily. When passing a pointer to a field in a struct, the Go memory in question is the memory occupied by the field, not the entire struct. When passing a pointer to an element in an array or slice, the Go memory in question is the entire array or the entire backing array of the slice.

C code may not keep a copy of a Go pointer after the call returns.

A Go function called by C code may not return a Go pointer. A Go function called by C code may take C pointers as arguments, and it may store non-pointer or C pointer data through those pointers, but it may not store a Go pointer in memory pointed to by a C pointer. A Go function called by C code may take a Go pointer as an argument, but it must preserve the property that the Go memory to which it points does not contain any Go pointers.

Go code may not store a Go pointer in C memory. C code may store Go pointers in C memory, subject to the rule above: it must stop storing the Go pointer when the C function returns.

These rules are checked dynamically at runtime. The checking is controlled by the cgocheck setting of the GODEBUG environment variable. The default setting is GODEBUG=cgocheck=1, which implements reasonably cheap dynamic checks. These checks may be disabled entirely using GODEBUG=cgocheck=0. Complete checking of pointer handling, at some cost in run time, is available via GODEBUG=cgocheck=2.

It is possible to defeat this enforcement by using the unsafe package, and of course there is nothing stopping the C code from doing anything it likes. However, programs that break these rules are likely to fail in unexpected and unpredictable ways.

"These rules are checked dynamically at runtime."


Benchmarks:

To paraphrase, there are lies, damn lies, and benchmarks.

For valid comparisons across operating systems you need to run on identical hardware. For example, the difference between CPUs, memory, and rust or silicon disk I/O. I dual-boot Linux and Windows on the same machine.

Run benchmarks at least three times back-to-back. Operating systems try to be smart. For example, caching I/O. Languages using virtual machines need warm-up time. And so on.

Know what you are measuring. If you are doing sequential I/O, you spend almost all your time in the operating system. Have you turned off malware protection? And so on.

And so on.

Here are some results for disk.go from the same machine using dual-boot Windows and Linux.

Windows:

>go build disk.go
>/TimeMem disk
Duration : 18.3300322s
Elapsed time   : 18.38
Kernel time    : 13.71 (74.6%)
User time      : 4.62 (25.1%)

Linux:

$ go build disk.go
$ time ./disk
Duration : 18.54350723s
real    0m18.547s
user    0m2.336s
sys     0m16.236s

Effectively, they are the same, 18 seconds disk.go duration. Just some variation between operating systems as to what is counted user time and what is counted as kernel or system time. Elapsed or real time is the same.

In your tests, kernel or system time was 93.72% runtime.cgocall versus 95.49% syscall.Syscall.

peterSO
  • 158,998
  • 31
  • 281
  • 276
  • Thank. Then `runtime.cgocall` does not mean it use cgo. But it does a similar job. – samonzeweb Mar 04 '16 at 17:14
  • @samonzeweb: `runtime.cgocall` time measures the time spent outside Go when calling via `cgo` and, on Windows, when calling Windows `dll`s. In your `disk.go` program all the calls outside Go are to Windows `dll`s. – peterSO Mar 04 '16 at 18:49
  • @samonzeweb: See my revised answer for some comments on benchmarks and some benchmark results. – peterSO Mar 04 '16 at 18:50
  • Also, if this answered your question, please accept it @samonzeweb – Derek Pollard Mar 04 '16 at 18:54
  • As I said myself, I didn't compare performance between Windows and Linux, as it was not the same hardware. I wanted to see if the call to `runtime.cgocall` is specific to Windows or not, and then I asked the question about this difference. It's not a performance question, but an implementation one. I've my answer, thanks. And even if benchmark between Linux and Windows wasn't the purpose, your benchmark is welcome, I'll ended to do it soon or later ;) – samonzeweb Mar 04 '16 at 19:04