1

I have an assignment asking me to find the hamming distance of two user-input strings that are not necessarily equal in length.

So, I made the following algorithm:

  1. Read both strings
  2. check the length of each string
  3. compare the length of the strings
  4. if(str1 is shorter)
    set counter to be the length of str1
    END IF
  5. if(str1 is longer)
    set counter to be the length of str2
    END IF
  6. if(str1 == str2)
    set counter to be length of str1
    END IF
  7. loop through each digit of the strings
    if(str1[digitNo] XOR str2[digitNo] == 1)
        inc al
        END IF
    
  8. the final al value is the hamming distance of the strings, print it.

But I'm stuck at step 3 and I don't seem to get it working. any help?

I tried playing around with the registers to save the values in, but none of that worked, I still didn't get it working.

    ; THIS IS THE CODE I GOT
.model small
.data
str1 db 255 
db ?
db 255 dup(?)
msg1 db 13,10,"Enter first string:$"
str2 db 255
db ?
db 255 dup(?)
msg2 db 13,10,"Enter second string:$"
one db "1"
count db ?
.code
.startup
mov ax,@data
mov ds,ax

; printing first message
mov ah, 9
mov dx, offset msg1
int 21h
; reading first string
mov ah, 10
mov dx, offset str1
int 21h            
; printing second message
mov ah, 9
mov dx, offset msg2
int 21h
; reading second string
mov ah, 10
mov dx, offset str2
int 21h 

; setting the values of the registers to zero
mov si, 0
mov di, 0
mov cx, 0
mov bx, 0 


; checking the length of the first string
mov bl, str1+1
add bl, 30h
mov ah, 02h
mov dl, bl
int 21h


; checking the length of the second string
mov bl, str2+1
add bl, 30h
mov ah, 02h 
mov dh, bl
int 21h  


; comparing the length of the strings
cmp dl,dh 
je equal
jg str1Greater
jl str1NotGreater

; if the strings are equal we jump here
equal:      
mov cl, dl
call theLoop

; if the first string is greater than the second, we jump here and set counter of str1
str1Greater:


; if the second string is greater than the first, we jump here and set counter to length of str2
Str1NotGreater: 



; this is the loop that finds and prints the hamming distance
;we find it by looping over the strings and taking the xor for each 2, then incrementing counter of ones for each xor == 1
theLoop:



end

So, in the code I provided, it's supposed to print the length of each string (it prints the lengths next to each other), but it seems to always keep printing the length of the first string, twice. The register used to store the length of the first string is dl, and the register used to store the length of the second is dh, if I change it back to dl, it would then print the correct length, but I want to compare the lengths, and I think it won't be possible to do so if I save it in dl both times.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
  • The string-input functions should return the length of the string somewhere; save it. I don't know why you'd be making another DOS system call with `ah = 02h` in the code you call `checking the length of the first string`. Oh, is that a debug-print of a single character? (That you can look up in an ASCII table if it's greater then 9). That's not what I'd call "checking". I assumed you meant doing a strlen or something to find the end of a 0-terminated string. – Peter Cordes Aug 10 '19 at 13:00
  • Anyway, you just need to set count = min(len1, len2). The easiest way to do that is `reg = len1`, then cmp / jcc over a `reg=len2`. You don't have to care whether they're equal or not. – Peter Cordes Aug 10 '19 at 13:02
  • Also, if the input strings are composed of arbitrary ASCII characters, you probably want to loop over the bits of the XOR result, i.e. a "population count". It might not be exactly `1`. e.g. [NASM: Count how many bits in a 32 Bit number are set to 1](//stackoverflow.com/q/2931497) or [Hamming weight ( number of 1 in a number) mixing C with assembly](//stackoverflow.com/a/27053187) or a 256-byte lookup table. Or a bithack. [Count bits 1 on an integer as fast as GCC \_\_builtin\_\_popcount(int)](//stackoverflow.com/q/51387998) – Peter Cordes Aug 10 '19 at 13:06
  • Thanks for the response, just to clarify things more, this is an assembly 8086 code. This is a course i’m taking right now in college, and it’s an introduction to the language, we are not allowed to use string functions because they’re not in the syllabus. Also, we are ordered to get the least number of substitutions between two strings, so, if a string is longer than the other, we need to loop only until the end of the shorter string, so, we really do have to care if they are equal or not. – BeginnerCoder Aug 10 '19 at 13:10
  • Yeah, I understand that, when I said `strlen` I meant a loop that stops at a `0` byte, not a function call to the standard library. When you say substitutions, does changing 1 bit count as 1, or does replacing the whole byte count as 1 no matter how different the characters were? If the latter, then you should be checking for `str1[digitNo] != str2[digitNo]`, not their bitwise XOR==1 (which is nonsense unless your alphabet is only ASCII `0` and `1`) or the integer hamming-weight between the ASCII codes. Note that `a != b` is equivalent to `a XOR b != 0` – Peter Cordes Aug 10 '19 at 13:14
  • And I know you don't have to care that they're equal length. In my [earlier comment](https://stackoverflow.com/questions/57442073/how-to-find-the-hamming-distance-for-strings-that-are-not-necessarily-equal-leng?noredirect=1#comment101360861_57442073), the implementation I was describing for `min(len1, len2)` was using `reg = len1` being an assignment, not a comparison. – Peter Cordes Aug 10 '19 at 13:16
  • Thanks @PeterCordes. The loop I’m intending to use does compare digit by digit, if the digit of the first string != the digit of the second, I increment the counter by one until we reach the end of the loop. The idea is there, but I don’t seem to be able to translate it to code. I would appreciate it if someone could help me with the code, because that is my problem. Thanks again. – BeginnerCoder Aug 10 '19 at 13:18
  • I like that you wrote out your ideas in pseudocode so we don't have to wade through all the details. Makes it much quicker to check your logic than trying to wade through messy code. But your pseudocode is wrong when it says `if(str1[digitNo] XOR str2[digitNo] == 1)` because binary XOR doesn't produce a boolean 0 or 1, it produces an 8-bit number. Since you've confirmed that you want to check for !=, write it that way in your pseudocode, and implement it in asm with `cmp` – Peter Cordes Aug 10 '19 at 13:23
  • Anyway, I have no idea what part of implementing this would give you a problem. Given that algorithm, implementing it in assembly is straightforward if you know assembly. Since you apparently don't, you're going to have to explain what you do know if you want anyone to fill in the gaps. Stack Overflow isn't the right site for a back-and-forth hand-holding tutorial, or an open-ended question like "teach me assembly from scratch" without making any assumptions about what you already know. Your question as written is asking about step 3, which has a simple answer. – Peter Cordes Aug 10 '19 at 13:25
  • @PeterCordes each bit change is counted as 1. Say, string1 is : “Org” and string2 is “asm”, the result should be 7, we convert each digit to its binary representation, and then see number of substitutions required to change it to the other bit from string2, O to binary is 1001111, a to binary is 1100001, number of substitutions here is 4, and so on for each character. We convert character to binary, and then check bit by bit for substitutions. To clarify things more, if string1 was “Orga” and string 2 was “asm” the result must also be 7, because in that case we loop until we reach the end of.. – BeginnerCoder Aug 10 '19 at 13:27
  • ..the shorter string. That’s why knowing the length is important. – BeginnerCoder Aug 10 '19 at 13:28
  • Oh, so you do want the hamming weight (aka popcount) of the XOR between bytes. You don't *convert* characters to binary, they're already in memory/registers as binary. XOR between 2 bytes gives you match / mismatch for 8 bits at once, so you loop over that to count them. (Or use a lookup table or whatever) – Peter Cordes Aug 10 '19 at 13:29
  • And yes I get that you only run this up to `min(len1, len2)` (in bytes). – Peter Cordes Aug 10 '19 at 13:31
  • Anyway, now your XOR == 1 makes more sense; you were picturing it as unpacking individual bits first and then XORing them 1 at a time. But that's highly inefficient and complicated. An easier implementation strategy that will give the same result is to loop over bytes and count set bits in `*p++ XOR *q++` iterating 2 pointers, or an index or whatever. – Peter Cordes Aug 10 '19 at 13:37
  • @PeterCordes I thought of just XORing both strings and then looping over the result until we reach the end of the shorter string(using a counter), but I faced two problems, the first is that I was not able to get it to XOR the strings at all, I tried this: lea si, str1 lea ds, str2 xor str1[si],str2[ds]. But it didn’t work for me. The second problem is that even if first problem was solved, I still can’t figure out how to compare the string’s lengths(because I want to use that in the loop) – BeginnerCoder Aug 10 '19 at 13:40
  • [No x86 instruction can take 2 explicit memory operands](//stackoverflow.com/q/33794169), you need at least one operand to be a register. Besides, you want the result in a register where you can use it before moving on to the next byte. There's no reason to store the XOR result back to memory by using a memory-destination XOR. So `mov al, [si]` ; `inc si` ; `xor al, [di]` ; `inc di` or something like that. (With pointers in SI, you don't want `str1[si]` - that would double the address. Or you could start with `si=0` and use it for both, like `str1[si]` and `str2[si]` so it's just an index) – Peter Cordes Aug 10 '19 at 13:53
  • @PeterCordes How do I implement what you said in code? Been trying to figure it out for the last ~40 mins, was not able to do it. – BeginnerCoder Aug 10 '19 at 14:43
  • In an earlier comment I linked several ways to implement popcount. Adapting those to an 8-bit integer instead of 32-bit should be straightforward, in an inner loop inside your loop over bytes. – Peter Cordes Aug 10 '19 at 14:47

1 Answers1

2

but it seems to always keep printing the length of the first string, twice.

When outputting a character with the DOS function 02h you don't get to choose which register to use to supply the character! It's always DL.

Since after printing both lengths you still want to work with these lengths it will be better to not destroy them in the first place. Put the 1st length in BL and the second length in BH. For outputting you copy these in turn to DL where you do the conversion to a character. This of course can only work for strings of at most 9 characters.


 ; checking the length of the first string
 mov  BL, str1+1
 mov  dl, BL
 add  dl, 30h
 mov  ah, 02h
 int  21h

 ; checking the length of the second string
 mov  BH, str2+1
 mov  dl, BH
 add  dl, 30h
 mov  ah, 02h 
 int  21h  

 ; comparing the length of the strings
 cmp  BL, BH 
 ja   str1LONGER
 jb   str1SHORTER

 ; if the strings are equal we ** FALL THROUGH ** here
equal:      
 mov  cl, BL
 mov  ch, 0
 call theLoop

!!!! You need some way out at this point. Don't fall through here !!!!

; if the first string is greater than the second, we set counter of str1
str1LONGER:


; if the second string is greater than the first, we set counter to length of str2
Str1SHORTER: 


; this is the loop that finds and prints the hamming distance
;we find it by looping over the strings and taking the xor for each 2, then incrementing counter of ones for each xor == 1
theLoop:

Additional notes

  • Lengths are unsigned numbers. Use the unsigned conditions above and below.
  • Talking about longer and shorter makes more sense for strings.
  • Don't use 3 jumps if a mere fall through in the code can do the job.
  • Your code in theLoop will probably use CX as a counter. Don't forget to zero CH. Either using 2 instructions like I did above or else use movzx cx, BL if you're allowed to use instructions that surpass the original 8086.

Bonus

 mov  si, offset str1+2
 mov  di, offset str2+2
 mov  al, 0
MORE:
 mov  dl, [si]
 cmp  dl, [di]
 je   EQ
 inc  al
EQ:
 inc  si
 inc  di
 loop MORE
Fifoernik
  • 9,779
  • 1
  • 21
  • 27
  • According to comments, the OP apparently *does* want the bitwise hamming distance. Their XOR == 1 is considering the strings as bit-strings, not byte-strings. So you need to popcount the XOR of non-matching bytes. – Peter Cordes Aug 17 '19 at 14:30
  • Also, you're repeating the OP's mistake of overcomplicating by checking for equal separately. If they're equal, you can take either length and don't need to identify that condition separately. You just need `len = (len1<=len2) ? len1 : len2;` for example. Super simple, mov / cmp/jcc / mov. – Peter Cordes Aug 17 '19 at 14:33