it would result: "HBACTeyparleprelloe"
I sure hope this was a typo because otherwise this would become a very nasty exercise indeed! I will be assuming "HBACTeyparleprelleoe".
it can intercalate from stings with the same size
Your present code seems to do that correctly, but why is it so convoluted?
If the current index (offset in the string) is 0, you just do movsb
. And if the current index isn't 0, so you need to skip ahead, you do so with a (wasteful) loop
of lodsb
instructions. Sometimes people wonder why rep lodsb
is allowed, well here they have a bit of a use case. Although not really, since the practical solution would be to replace:
mov esi,cad1
cld
mov ecx,ebx
cmp ebx,0
jne THEN1
je ELSE1
THEN1:
lodsb
loop THEN1
ELSE1:
movsb
entirely by:
lea esi, [cad1 + ebx]
movsb
or alternatively by:
movzx eax, byte [cad1 + ebx]
stosb
I don't know what to do to make it work for different sized strings
Below I will present 3 solutions, all tested.
Solution 1
Because there are 5 input strings precisely, the 32-bit x86 architecture has just the right number of registers to keep individual pointers in their own register. This approach gives the fastest code but only if the lengths of the individual strings don't differ by too much.
S: db 43 dup 0
S1: db "Hello", 10
S2: db "Bye", 10
S3: db "AppleADayKeepsTheDoctorAway", 10
S4: db "Car", 10
S5: db "Tree", 10
...
Begin: mov ebx, S1 ; Addresses of the input strings
mov ecx, S2
mov edx, S3
mov esi, S4
mov edi, S5
mov ebp, S ; Address of the output string
.a: push ebp ; (1)
movzx eax, byte [ebx] ; Read a character from this string
cmp al, 10 ; If this string is exhausted, then
je .b ; no longer add to the output string
inc ebx ; Go to the next character in this string
mov [ebp], al ; Add character to the output string
inc ebp
.b: movzx eax, byte [ecx] ; Read a character from this string
cmp al, 10 ; If this string is exhausted, then
je .c ; no longer add to the output string
inc ecx ; Go to the next character in this string
mov [ebp], al ; Add character to the output string
inc ebp
.c: movzx eax, byte [edx] ; Read a character from this string
cmp al, 10 ; If this string is exhausted, then
je .d ; no longer add to the output string
inc edx ; Go to the next character in this string
mov [ebp], al ; Add character to the output string
inc ebp
.d: movzx eax, byte [esi] ; Read a character from this string
cmp al, 10 ; If this string is exhausted, then
je .e ; no longer add to the output string
inc esi ; Go to the next character in this string
mov [ebp], al ; Add character to the output string
inc ebp
.e: movzx eax, byte [edi] ; Read a character from this string
cmp al, 10 ; If this string is exhausted, then
je .f ; no longer add to the output string
inc edi ; Go to the next character in this string
mov [ebp], al ; Add character to the output string
inc ebp
.f: pop eax ; (1)
cmp eax, ebp ; Was anything added to the output string ?
jne .a ; Yes, then repeat
Solution 2
A minor edit allows us to process any number of input strings. This approach is slower than before, but it suffers from the necessity to pad the strings so they have the same lengths (like you had it in your question).
S: db 43 dup 0
S1: db "Hello", 22 dup 10, 10
S2: db "Bye", 24 dup 10, 10
S3: db "AppleADayKeepsTheDoctorAway", 10
S4: db "Car", 24 dup 10, 10
S5: db "Tree", 23 dup 10, 10
...
Begin: xor ebx, ebx ; Current offset in every string
mov ebp, S ; Address of the output string
.a: push ebp ; (1)
movzx eax, byte [S1 + ebx] ; Read a character from this string
cmp al, 10 ; If this string is exhausted, then
je .b ; no longer add to the output string
mov [ebp], al ; Add character to the output string
inc ebp
.b: movzx eax, byte [S2 + ebx] ; Read a character from this string
cmp al, 10 ; If this string is exhausted, then
je .c ; no longer add to the output string
mov [ebp], al ; Add character to the output string
inc ebp
.c: movzx eax, byte [S3 + ebx] ; Read a character from this string
cmp al, 10 ; If this string is exhausted, then
je .d ; no longer add to the output string
mov [ebp], al ; Add character to the output string
inc ebp
.d: movzx eax, byte [S4 + ebx] ; Read a character from this string
cmp al, 10 ; If this string is exhausted, then
je .e ; no longer add to the output string
mov [ebp], al ; Add character to the output string
inc ebp
.e: movzx eax, byte [S5 + ebx] ; Read a character from this string
cmp al, 10 ; If this string is exhausted, then
je .f ; no longer add to the output string
mov [ebp], al ; Add character to the output string
inc ebp
.f: inc ebx ; Go to next character in every string
pop eax ; (1)
cmp eax, ebp ; Was anything added to the output string ?
jne .a ; Yes, then repeat
Solution 3
This time we create an array with pointers to the individual strings. These pointers get used in succession to retrieve a character from the associated string, and when we encounter the end-of-string marker (10), we simply remove the concerned pointer from the array. The other solutions kept dealing with an exhausted string, but here an exhausted string vanishes from the loop.
Because this method has more housekeeping to do, it will run slower on your very regular test data. However once you feed it a more realistic data set, one with short and long strings, it will shine... There's also no limit on the number of input strings, padding is not required and neither is using same-size stringbuffers (like in your program).
P: dd S1, S2, S3, S4, S5, 0
S: db 43 dup 0
S1: db "Hello", 10
S2: db "Bye", 10
S3: db "AppleADayKeepsTheDoctorAway", 10
S4: db "Car", 10
S5: db "Tree", 10
...
Begin: mov ebp, S ; Address of the output string
jmp .e
.a: mov edi, ebx
.b: mov eax, [edi+4] ; Move all the stringpointers that follow
mov [edi], eax ; one position down in the array
add edi, 4
test eax, eax ; Until the zero-terminator got moved down
jnz .b
jmp .d ; Continue with the next stringpointer
.c: movzx eax, byte [esi] ; Read a character from the current string
cmp al, 10 ; If this string is exhausted, then
je .a ; go remove its pointer from the array
inc esi ; Go to the next character in the current string
mov [ebx], esi ; Update the current stringpointer
add ebx, 4 ; Go to the next stringpointer
mov [ebp], al ; Add character to the output string
inc ebp
.d: mov esi, [ebx] ; Get current stringpointer
test esi, esi ; Arrived at the end of the array if ESI is zero
jnz .c
.e: mov ebx, P ; Address of the array with stringpointers
mov esi, [ebx] ; Get current stringpointer
test esi, esi ; The array is empty if the 1st dword is zero
jnz .c
method 1 |
method 2 |
method 3 |
comment |
0.4 µsec |
0.5 µsec |
1.1 µsec |
5 short strings |
2.0 µsec |
2.1 µsec |
1.5 µsec |
with 1 long string |
Expected output:
HBACTeyparleprelleoeADayKeepsTheDoctorAway
[EDIT]
Building upon the many ideas kindly provided by @PeterCordes through comments, and throwing in a couple of new ideas of my own, I was able to write the following faster solutions. (I have dismissed the earlier solution 2 for the reason of the excessive padding that it requires.)
Solution 1b
Switching the roles of EBP and EDI as Peter suggested already improved the code by 25%. And adding instructions to set the pointer of an exhausted string to zero, so as to obtain a cheap way to no longer having to process the string, improved the code by another 20%. I did give stosb
a chance, but abandoned the idea because it made the code run 17% slower.
S: db 43 dup 0
S1: db "Hello", 10
S2: db "Bye", 10
S3: db "AppleADayKeepsTheDoctorAway", 10
S4: db "Car", 10
S5: db "Tree", 10
...
mov ebx, S1
mov ecx, S2
mov edx, S3
mov esi, S4
mov ebp, S5
mov edi, S
.a: push edi
test ebx, ebx
jz .b
movzx eax, byte [ebx]
cmp al, 10
je .clr1
inc ebx
mov [edi], al
inc edi
.b: test ecx, ecx
jz .c
movzx eax, byte [ecx]
cmp al, 10
je .clr2
inc ecx
mov [edi], al
inc edi
.c: test edx, edx
jz .d
movzx eax, byte [edx]
cmp al, 10
je .clr3
inc edx
mov [edi], al
inc edi
.d: test esi, esi
jz .e
movzx eax, byte [esi]
cmp al, 10
je .clr4
inc esi
mov [edi], al
inc edi
.e: test ebp, ebp
jz .f
movzx eax, byte [ebp]
cmp al, 10
je .clr5
inc ebp
mov [edi], al
inc edi
.f: pop eax
cmp eax, edi
jne .a
...
.clr1: xor ebx, ebx
jmp .b
.clr2: xor ecx, ecx
jmp .c
.clr3: xor edx, edx
jmp .d
.clr4: xor esi, esi
jmp .e
.clr5: xor ebp, ebp
jmp .f
Solution 3b
The key improvements are:
- having the top of the inner loop (
.c
) 16-byte-aligned
- maintaining a count of pointers instead of zero-terminating the array
- early-exiting so the remainder of the last-remaining string can get copied verbatim
The use of stosb
didn't harm the execution time (only gain is codesize) and so I kept it this time.
P: dd S1, S2, S3, S4, S5
S: db 43 dup 0
S1: db "Hello", 10
S2: db "Bye", 10
S3: db "AppleADayKeepsTheDoctorAway", 10
S4: db "Car", 10
S5: db "Tree", 10
...
mov ebx, P ; Address of the pointers array
mov esi, [ebx]
mov edi, S ; Address of the destination string
mov ebp, 5 ; Number of remaining pointers
mov edx, ebp ; Inner loop counter
jmp .c
db (16-($+21) and 15) dup 0 ; 16-byte aligning `.c`
.a: dec ebp
dec edx
jz .d ; Nothing to copy (is last pointer)
mov esi, ebx
mov ecx, edx
.b: mov eax, [esi+4]
mov [esi], eax
add esi, 4
dec ecx
jnz .b
mov esi, [ebx]
.c: movzx eax, byte [esi]
cmp al, 10
je .a
inc esi
mov [ebx], esi
add ebx, 4
stosb
mov esi, [ebx]
dec edx
jnz .c
.d: mov ebx, P
mov esi, [ebx]
mov edx, ebp
cmp ebp, 1
ja .c ; Continue while at least 2 strings remain
jb .f ; Done if none remains
movzx eax, byte [esi] ; Copy remainder of last-remaining string quickly
cmp al, 10
je .f
.e: inc esi
stosb
movzx eax, byte [esi]
cmp al, 10
jne .e
.f: ...
Solution 1b |
Solution 3b |
Comment |
0.3875 µsec (0.4) |
0.6681 µsec (1.1) |
5 short strings |
1.1866 µsec (2.0) |
0.9033 µsec (1.5) |
with 1 long string |
Solution 1 can deal with at most 5 strings.
Solution 3 can deal with any number of strings.