How can I loop through a string in x86_64 assembly NASM?

Question

I ask the user to give me a number. I receive that number as a string and I want to iterate through each char of the string to check its content. I have written this code where numero_a_convertir is the pointer to the string the user gave me:

;* BASE 8 A BINARIO
convertir_octal_a_base2:
    mov rbx, 0

    cmp dword[numero_a_convertir + rbx], "0"
    je imprimir

    cmp dword[numero_a_convertir + rbx], "1"
    je imprimir

    cmp dword[numero_a_convertir + rbx], "2"
    je imprimir

    cmp dword[numero_a_convertir + rbx], "3"
    je imprimir

    cmp dword[numero_a_convertir + rbx], "4"
    je imprimir

    cmp dword[numero_a_convertir + rbx], "5"
    je imprimir

    cmp dword[numero_a_convertir + rbx], "6"
    je imprimir

    cmp dword[numero_a_convertir + rbx], "7"
    je imprimir

    jmp fin_conversion
imprimir:
    mov     rcx, MENSAJE_A_IMPRIMIR_CHAR 
    mov     rdx, [numero_a_convertir + rbx]      
    sub     rsp, 32
    call    printf
    add     rsp, 32
fin_conversion:
ret

If the user inputs a string with only 1 character, for example "1", it works, because RBX is set on 0. But if the user inputs more than 1 character, for example "12" or "12345", it doesn't work anymore.

The same thing if I set RBX on 2. If the input is "123", it works. If the input is "12345", it breaks.

Why am I not being able to access char number 3 of a 5-char string?

You don't need to check all 8 digits separately, just `c >= '0' && c <= '7'`. Or `c - '0' <= 7u` with a `sub` / `cmp` / `jb`. Non-digits will wrap outside that range, or never get down into it. — Peter Cordes, Nov 26 '22 at 23:21
Im trying to code a base converter. I need to know the value of EACH digit to be able to convert them to the equivalent in another base. Ive tried the cmp byte, that is not the problem — Guido, Nov 26 '22 at 23:27
Please show what is in the message *MENSAJE_A_IMPRIMIR_CHAR* that you pass to *printf*. — Sep Roland, Nov 27 '22 at 16:30
Comparing 4 bytes at once might not be the only problem, but it's definitely *a* problem. — Peter Cordes, Nov 27 '22 at 19:41

score 2 · Answer 1 · answered Nov 27 '22 at 19:24

I receive that number as a string and I want to iterate through each char of the string to check its content.

Your string is defined by its address (numero_a_convertir) and by its length which is either a direct value (RCX) or a string terminating byte (0).

You don't need separate steps for verifying the validity of the inputted characters and for converting from text to (unsigned) integer.

Each octal digit requires 3 bits in the resulting RAX register. The pair rol rax, 3 test al, 7 makes sure an empty trio is available so that or rax, rdx can stuff the newest digit in there.

String with a known length in RCX

; IN (rbx,rcx) OUT (rax) MOD (rbx,rcx,rdx)
convertir_octal_a_base2:
    xor   eax, eax        ; RAX = 0
.Loop:
    movzx edx, byte [rbx] ; -> RDX = ["0","7"] (NewDigit) ?
    sub   edx, 48
    cmp   dl, 7
    ja    .NotDigit
    inc   rbx             ; Next character
    rol   rax, 3          ; Result = Result * 8
    test  al, 7
    jnz   .Overflow
    or    rax, rdx        ; Result = Result + NewDigit
    dec   rcx
    jnz   .Loop
    ret
.NotDigit:
    ???
.Overflow:
    ???

When the code stumbles upon a character that is not an octal digit, you have a choice to either disapprove the whole input or just return the value up to that point. The latter is what often is used in an high level language like BASIC.
It's up to you to decide whether you will consider overflow or not, and what to do in case it occurs.

String ending with the zero-terminator

; IN (rbx) OUT (rax) MOD (rbx,rdx)
convertir_octal_a_base2:
    xor   eax, eax        ; RAX = 0
    movzx edx, byte [rbx] ; -> RDX = ["0","7"] (NewDigit) ?
    sub   edx, 48
    cmp   dl, 7
    ja    .CouldBeNULL
.Loop:
    inc   rbx             ; Next character
    rol   rax, 3          ; Result = Result * 8
    test  al, 7
    jnz   .Overflow
    or    rax, rdx        ; Result = Result + NewDigit
    movzx edx, byte [rbx] ; -> RDX = ["0","7"] (NewDigit) ?
    sub   edx, 48
    cmp   dl, 8
    jb    .Loop           ; It's an octal digit
.CouldBeNULL:
    cmp   dl, -48
    jne   .NotDigit
    ret
.NotDigit:
    ???
.Overflow:
    ???

When the code reads the final zero terminator, the sub edx, 48 instruction will produce DL=-48 which is different from anything we would get from an invalid character. Because this test sits outside the loop, the loop is more efficient.
When the code stumbles upon a character that is not an octal digit, you have a choice to either disapprove the whole input or just return the value up to that point. The latter is what often is used in an high level language like BASIC.
It's up to you to decide whether you will consider overflow or not, and what to do in case it occurs.

If you don't need overflow detection, total = total*base + digit becomes `lea rax, [rax*8 + rdx]`. Knowing the length ahead of time means you can prove non-overflow when `length*3 < 64`, but there can be leading zeros, or the leading digit might be a `1`. — Peter Cordes, Nov 27 '22 at 19:46

How can I loop through a string in x86_64 assembly NASM?

1 Answers1

String with a known length in RCX

String ending with the zero-terminator