6

I made my own implementation of strlen in assembly, but it doesn't return the correct value. It returns the string length + 4. Consequently. I don't see why.. and I hope any of you do...

Assembly source:

section .text
    [GLOBAL stringlen:] ; C function

stringlen:  
    push ebp
    mov ebp, esp        ; setup the stack frame

    mov ecx, [ebp+8]

    xor eax, eax        ; loop counter


startLoop:
    xor edx, edx
    mov edx, [ecx+eax]
    inc eax

    cmp edx, 0x0 ; null byte    
    jne startLoop
end:    
    pop ebp

    ret

And the main routine:

#include <stdio.h>

extern int stringlen(char *);

int main(void)
{
  printf("%d", stringlen("h"));

  return 0;
}

Thanks

Michel
  • 113
  • 1
  • 1
  • 5

6 Answers6

6

You are not accessing bytes (characters), but doublewords. So your code is not looking for a single terminating zero, it is looking for 4 consecutive zeroes. Note that won't always return correct value +4, it depends on what the memory after your string contains.

To fix, you should use byte accesses, for example by changing edx to dl.

Jester
  • 56,577
  • 4
  • 81
  • 125
  • I thought setting edx back to 0 in every loop would fix that too.. but apperently it doesn't.. Thanks for your answer. – Michel Feb 18 '11 at 14:23
  • It doesn't because the `mov edx, [ecx+eax]` will load 4 bytes from memory, overwriting whatever was in `edx` (zero in this case). – Jester Feb 18 '11 at 15:07
5

Thanks for your answers. Under here working code for anyone who has the same problem as me.

section .text
    [GLOBAL stringlen:]

stringlen:  
    push ebp
    mov ebp, esp

    mov edx, [ebp+8]    ; the string
    xor eax, eax        ; loop counter

    jmp if

then:
    inc eax

if:
    mov cl, [edx+eax]
    cmp cl, 0x0
    jne then

end:
    pop ebp
    ret
Michel
  • 113
  • 1
  • 1
  • 5
1

Not sure about the four, but it seems obvious it will always return the proper length + 1, since eax is always increased, even if the first byte read from the string is zero.

unwind
  • 391,730
  • 64
  • 469
  • 606
1

Change the line

mov edx, [ecx+eax]

to

mov dl, byte [ecx+eax]

and

  cmp edx, 0x0 ; null byte

to

  cmp dl, 0x0 ; null byte

Because you have to compare only byte at a time. Following is the code. Your original code got off-by-one error. For "h" it will return two h + null character.

section .text
    [GLOBAL stringlen:] ; C function

stringlen:
    push ebp
    mov ebp, esp        ; setup the stack frame

    mov ecx, [ebp+8]

    xor eax, eax        ; loop counter


startLoop:
    xor dx, dx
    mov dl, byte [ecx+eax]
    inc eax

    cmp dl, 0x0 ; null byte
    jne startLoop
end:
    pop ebp

    ret
Zimbabao
  • 8,150
  • 3
  • 29
  • 36
0

More easy way here(ASCII zero terminated string only):

REPE SCAS m8

http://pdos.csail.mit.edu/6.828/2006/readings/i386/REP.htm

sharow
  • 19
  • 1
-2

I think your inc should be after the jne. I'm not familiar with this assembly, so I don't really know.

Satya
  • 4,458
  • 21
  • 29
  • I doubt that would be a good idea, cause if you do that you'll never move to the next letter in the string, as the jump will execute before the increase. – Tony The Lion Feb 18 '11 at 12:59
  • are you thinking of certain RISC architectures which have a branch delay slot, where there first instruction immediately following a jump is executed regardless of the jump being taken or not? – Chris Taylor Feb 24 '11 at 22:51