0

while reading the assembly output of a simple program built with the Go compiler I could not make sense of the string comparison implementation.

The program is like

package main

import (
    "fmt"
    "os"
)

func main() {

    fmt.Print("Enter the value: ")

    var v string
    fmt.Fscanf(os.Stdin, "%v", &v)

    if v != "123456" {
        fmt.Println("exit")
        os.Exit(2)
    }
    fmt.Println("v=", v)
}

In below extract, what does it do at 0x48bd94 and 0x48bd9d ?

$ objdump  --disassemble=main.main ./z
...
  48bd85:   48 8b 54 24 38          mov    0x38(%rsp),%rdx
  48bd8a:   4c 8b 02                mov    (%rdx),%r8
  48bd8d:   48 83 7a 08 06          cmpq   $0x6,0x8(%rdx)              
  48bd92:   75 12                   jne    48bda6 <main.main+0xe6>          
  48bd94:   41 81 38 31 32 33 34    cmpl   $0x34333231,(%r8)           
  48bd9b:   75 09                   jne    48bda6 <main.main+0xe6>
  48bd9d:   66 41 81 78 04 35 36    cmpw   $0x3635,0x4(%r8)             
  48bda4:   74 49                   je     48bdef <main.main+0x12f>     
...
  • 2
    Are you sure that is the assembly for the code in the question? That looks more like `v != 123456` rather than `v != aaaaaa`. – Jester May 25 '22 at 12:44
  • 2
    You're looking at some other bit of the output. The relevant excerpt from the `go tool objdump` run on the result of the building your code using Go 1.18.1 for linux/amd64 is [this](https://pastebin.com/mEn3E1BR), and it involves three `CMP` instructions—with the 1st checking the length of the returned string, and the other two—comparing its bytes(0x61 is the ASCII code of 'a'). – kostix May 25 '22 at 13:01
  • 3
    if you use [godbolt.org](https://godbolt.org/z/4qzEsaPrf) and hover with your mouse on the output if will show tooltips with extensive descriptions of each instruction and variables – blackgreen May 25 '22 at 13:04
  • 3
    Another approach is to run `go build -gcflags=-S your_source_file.go` — as it will dump to stderr the assembly code generated by the compiler. The upside of this approach is that the generated assembly output will be annotated by the file names and line numbers so you won't need to guess what bit of the source code the generated machine code represents. – kostix May 25 '22 at 13:12
  • 2
    Well, basically I'd say the problem is what @Jester said: your disassembly is for the code which also compared `v` against a string constant of length 6 but the contents "123456" while your source code contains another string constant. IOW, your code snippet does not match the machine code you've disassembled. – kostix May 25 '22 at 14:06
  • 2
    Still, the basic idea of what the compiler had generated is: 1) compare the length of the strings; if they do not match, the strings are obviously not equal; 2) use `cmpl` to compare the 1st 4 bytes of the string with the 4 bytes from the string literal; 3) if they are equal, use `cmpw` to compare the trailing 2 bytes. IOW, the compiler had deconstructed your string literal into a double word (4 bytes) and a word (2 bytes) making them immediate arguments to the `cmp*` instructions. Quite clever, I'd say. – kostix May 25 '22 at 14:09
  • @kostix, well yeah, i was not yet aware that this was data `$0x34333231`. Jester comments has put me on the way. If one tries with a string of large length, it will find a much different assembly code. –  May 25 '22 at 14:18
  • 1
    You can run `man ascii` or use something like http://man-ascii.com/ ;-) The strange pattern of ever-decreasing byte values should have made you suspect something fishy has been happening ;-) – kostix May 25 '22 at 14:27
  • I’m voting to close this question because i can not delete –  Jun 04 '22 at 10:24

1 Answers1

1

thanks to the comments, it does make sense.

To summarize, those two instructions are part of a larger three steps implementation of the string comparison.

At 0x48bd8d, it compares the string length where $0x6 is the hardcoded length of 123456 and 0x8(%rdx) the length of the scanned value. *

At 0x48bd9d and 0x48bd94 it compares the string in two steps, 0x34333231 translates to 4321 and 0x3635 is 65, and not some random memory address i was thinking at first.

  48bd8d:   48 83 7a 08 06          cmpq   $0x6,0x8(%rdx)              
  48bd92:   75 12                   jne    48bda6 <main.main+0xe6>          
  48bd94:   41 81 38 31 32 33 34    cmpl   $0x34333231,(%r8)           
  48bd9b:   75 09                   jne    48bda6 <main.main+0xe6>
  48bd9d:   66 41 81 78 04 35 36    cmpw   $0x3635,0x4(%r8)             
  48bda4:   74 49                   je     48bdef <main.main+0x12f>   

And, yes, Jester got it right.

As kostix mentionned, this is an optimization, a comparison against a longer string will produce something like

  48bda0:   e8 5b 63 f7 ff          call   402100 <runtime.memequal>

for my future self, https://cs.brown.edu/courses/cs033/docs/guides/x64_cheatsheet.pdf

  • 1
    This answer is still talking about `aaaaaa` at first (and then correctly talking about 4321 and 65), when the source this was compiled from certainly didn't use that. If you want to minimize confusion for future readers without redoing the disassembly, you could just change the question source and text to say `"123456"` everywhere you wrote `aaaaaa`, along with this answer. (Invalidating comments is a *good* thing; they exist to suggest improvements to the question. Unlike invalidating answers with question edits.) – Peter Cordes May 25 '22 at 16:54
  • that was the sharpest comment among all. –  May 25 '22 at 17:08