2

Input is to be taken from a-z or A-Z and the input is ended by an asterisk *.

We need to have the first and last Capital letters of the input characters as the output. Also, we should show the input we have taken each time. N.B. We take the inputs character by character, not as a string.

Test case 1: input: aAbCcP* output: AP

Test case 2: input: ZabCBc* output: ZB

I have written this code below, which satisfies Test Case 1, but not 2:

.MODEL
.STACK 100H
.DATA
   STR DB 'Enter letters:$'
.CODE

MAIN PROC

MOV AX, @DATA
MOV DS, AX

LEA DX, STR
MOV AH, 9
INT 21H 

cycle: 

    MOV AH, 1
    INT 21H

    CMP AL, '*'
    JZ output 
    CMP AL, 'Z' 
    JA save


head: 
    CMP BL, 1
    JZ save

    MOV BL, 1
    MOV BH, AL 

clear:
    XOR AL, AL  

save:
    MOV CH, AL

LOOP cycle 

output:
    MOV AH, 2
    MOV DL, BH
    INT 21H 

    MOV AH, 2
    MOV DL, CH
    INT 21H 


MAIN ENDP 
END MAIN 
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • What *does* your code print for case 2? This isn't a [mcve]. Also comment your code and/or describe in English what your algorithm is *supposed* to be, which registers are supposed to be holding what. I don't see why you have a `clear:` label when nothing jumps there. You only fall through to there from `head:`. Also it seems you read BL without ever zeroing BL or BX before the start of the loop. Your code appears to depend on BL != 1 on entry to the function. – Peter Cordes Jun 29 '19 at 20:07
  • Also your loop would end early depending on CX, or the `loop` instruction could corrupt CH. I think you want `jmp`, because there's no count-based loop termination condition. `loop` is basically `dec cx` / `jnz` but without updating flags, so obviously you don't want that. – Peter Cordes Jun 29 '19 at 20:10
  • Anyway, the most obvious problem is that CH is *always* updated every trip through the loop. It can only work if the last upper-case character is also the last overall. You might write 2 loops: one that finds the first capital and saves it in one register, then another than keeps track of the last-seen. – Peter Cordes Jun 29 '19 at 22:35

2 Answers2

3

First ask yourself these questions:

  • What are capitals?
    If we don't consider accented characters, then capitals are characters with ASCII codes ranging from 65 to 90.

  • Can I trust the user to only input characters from a-z or A-Z?
    No you can't. You don't have control over what the user does at the keyboard, and that's why your program should take a defensive approach and test for capitals with something better than a single cmp al, 'Z'.

  • What will be the result if the input didn't contain a single capital?
    You could choose to print two spaces, or a descriptive message, or like I did display nothing at all.

  • What will be the result if the input contains only one capital?
    You could choose to print that one capital, or like I did display it twice because if you think of it, that single capital is at the same time the first occurence of a capital and also the last occurence of a capital.

  • What input/output functions will I use?
    For single character input you have a choice between DOS functions 01h, 06h, 07h, 08h, 0Ch, and 3Fh.
    For single character output you have a choice between DOS functions 02h, 06h, and 40h.
    If you're new to assembly then stick with the simpler ones and use functions 01h and 02h. Do consult the API reference before using any DOS function. And of course check with emu8086 whether it supports the function altogether!

You need to decide about all of the above in order to tackle the task. What is important, is that for every choice you make, you can defend your choice.


Below is my version of this task. For simplicity I'm using the tiny program model. See the ORG 256 directive on top? This program model has the major benefit of having all the segment registers pointing equally to your program (CS = DS = ES = SS).

The program runs 2 loops. The first loop runs until a capital is received. (Goes without saying that it stops earlier if the input contains an asterisk.) Because that capital is at the same time the first occurence of a capital and also the last occurence of a capital, I save it twice, both in DL and DH.

The second loop runs until an asterisk is received. Each time that a new capital comes along, it replaces what is written in DH. When this loop finally ends, both DL and DH are displayed on screen and in this order of course.

The program exits with the preferred DOS function 4Ch to terminate a progam.

I've written some essential comments, refrained from adding redundant ones, and used descriptive names for the labels in the program. Do note that nice tabular layout. For readability it's crux.

        ORG     256

Loop1:  mov     ah, 01h     ; DOS.GetKeyboardCharacter
        int     21h         ; -> AL
        cmp     al, "*"     ; Found end of input marker ?
        je      Done
        cmp     al, "A"
        jb      Loop1
        cmp     al, "Z"
        ja      Loop1
        mov     dl, al      ; For now it's the first
        mov     dh, al      ; AND the last capital

Loop2:  mov     ah, 01h     ; DOS.GetKeyboardCharacter
        int     21h         ; -> AL
        cmp     al, "*"     ; Found end of input marker ?
        je      Show
        cmp     al, "A"
        jb      Loop2
        cmp     al, "Z"
        ja      Loop2
        mov     dh, al      ; This is the latest capital
        jmp     Loop2

Show:   mov     ah, 02h     ; DOS.DisplayCharacter
        int     21h         ; -> (AL)
        mov     dl, dh
        mov     ah, 02h     ; DOS.DisplayCharacter
        int     21h         ; -> (AL)

Done:   mov     ax, 4C00h   ; DOS.TerminateWithReturnCode
        int     21h

Example:

aZeRTy*

aZeRTy*ZT


It would be very disappointing if you took it the easy way and just copy/pasted my code. I've tried to explain it in great detail and hope that you learn a lot from it.

My solution is certainly not the only good solution for this task. You could e.g. first input all of the characters and store them in memory somewhere, after which you process these characters from memory similar to how I did it.
Please try to write a working version that does it in this alternative way.You can only get smarter! Happy programming.

Community
  • 1
  • 1
Sep Roland
  • 33,889
  • 7
  • 43
  • 76
  • The OP already is using `cmp al, 'Z'`. I'm not sure what your first bullet point is about. They're allowed to assume their input is alphabetic, so `c > 'Z'` means lower-case, otherwise upper. Your version complicates it by using extra branches to reject non-alphabetic characters. – Peter Cordes Jun 29 '19 at 22:36
  • But yes, two separate loops seems like the best bet, instead of branching on a flag every time you find a new capital. – Peter Cordes Jun 29 '19 at 22:38
  • @PeterCordes My answer is based on [the previous question](https://stackoverflow.com/questions/56765729/input-is-to-be-taken-from-a-z-or-a-z-we-need-to-have-the-first-and-last-capital) by the OP. I've mistakenly posted it here. I think nonetheless that it remains a valid answer. – Sep Roland Jun 29 '19 at 22:41
  • 1
    Yes, that question should get deleted or closed as a dup of this. Maybe simplify that first bullet point because the OP already knows that. Or make a point about excluding non-alphabetic characters, to make the code more complicated but able to handle inputs like `12ABcd7_` – Peter Cordes Jun 29 '19 at 23:27
1

Your code is broken because you always fall through to save: MOV CH, AL every iteration, so it can only work if the last capital is also the very last character of the whole input.

Single-step it with a debugger for a simple input like ABc* to see how it goes wrong.

Also, you use loop, which is like dec cx/jnz. That makes no sense because there's no counter-based termination condition, and could potentially corrupt CH if CL was zero. You don't even initialize CX first! The loop instruction is not the only way to loop; it's just a code-size peephole optimization you can use when it's convenient to use CX as a loop counter. Otherwise don't use it.


This is a simplified version of Sep's implementation, taking advantage of the fact that the input is guaranteed to be alphabetic, so we really can check for upper case as easily as c <= 'Z' (after ruling out the '*' terminator). We don't have to worry about inputs like 12ABcd7_ or spaces or newlines, which also have lower ASCII codes than the upper-case alphabetic range. Your cmp al,'Z' / ja check was correct, it's just the code you were branching to that didn't have sane logic.

Even if you did want to strictly check c >= 'A' && c <= 'Z', that range check can be done with one branch using sub al,'A' ; cmp al,'Z'-'A' ; ja non_upper instead of a pair of cmp/jcc branches. (That modifies the original, but if you save it in SI or something you could later restore it with lea ax, [si+'A'])

You can also put a conditional branch at the bottom of the loop for both loops, instead of a jmp at the bottom and an if() break inside. Sep's code already did that for the first loop.

I agree with Sep that having 2 loops is easier than checking a flag every time you find a capital (to see if it's the first capital or not).

        ORG     100h        ; DOS .com is loaded with IP=100h, with CS=DS=ES=SS
                            ; we don't actually do any absolute addressing so no real effect.

        mov     ah, 01h     ; DOS.GetKeyboardCharacter
                            ; AH=01 / int 21h doesn't modify AH so we only need this once
find_first_cap:  
        int     21h         ; stdin -> AL
        cmp     al, '*'     ; Found end of input marker ?
        je      Done        ;  if (c=='*') return;  without print anything, we haven't found a capital yet

        cmp     al, 'Z'
        ja      find_first_cap
    ; fall through: AL <= 'Z' and we can assume it's a capital letter, not a digit or something.

        mov     dl, al      ; For now it's the first
        ;mov     dh, al      ; AND the last capital

        ;mov     ah, 01h     ; DOS.GetKeyboardCharacter   AH still = 01
        ;jmp     loop2_entry      ; we can let the first iteration set DH
Loop2:                      ; do {
        cmp     al, 'Z'       ; assume all c <= 'Z' is a capital alphabetic character
        ja      loop2_entry
        mov     dh, al        ; This is the latest capital

loop2_entry:
        int     21h         ; stdin -> AL
        cmp     al, '*'
        jne     Loop2       ; }while(c != '*');


Show:   mov     ah, 02h     ; DOS.DisplayCharacter
        int     21h         ; AL -> stdout
        mov     dl, dh
        ; mov     ah, 02h     ; DOS.DisplayCharacter
        int     21h         ; AL -> stdout

Done:   mov     ax, 4C00h   ; DOS.TerminateWithReturnCode
        int     21h

At this point it's arguably not simpler, but is more optimized especially for code-size. That tends to happen when I write anything because that's the fun part. :P

Having a taken branch inside the loop for the non-capital case is arguably worse for performance. (In modern code for a P6-compatible CPU you'd probably use cmovbe esi, eax instead of a conditional branch, because a conditional move is exactly what you want.)

Omitting the mov ah, XX before an int 21h because it's still set doesn't make your program more human-readable, but it is safe if you're careful to check the docs for each call to make sure they don't return anything in AH.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • 1
    If you were optimizing for code size you would have also removed `Done: mov ax, 4C00h ; DOS.TerminateWithReturnCode` `int 21h` in favor of `Done: ret` . COM programs can be terminated by a simple `ret`. DOS will push a 0000h on the top of the stack before transferring control to CS:100h. CS:0000h is the start of the PSP and contains the instruction `int 20h` which in turn exits back to DOS. – Michael Petch Jun 30 '19 at 00:44
  • @MichaelPetch: heh, I hadn't been thinking about optimizing the stuff outside of the algorithm itself, but I guess in a `.com` file there isn't any other overhead so sure. Does `int 20h` (via ret or not) set exit status = 0? Everything I'm finding with google calls it "without" an exit status, which doesn't make sense unless DOS exit statuses are like flag+value or something that can encode not-a-status instead of a number. – Peter Cordes Jun 30 '19 at 01:02