What's the purpose of PUSH CS / POP DS before a REP MOVSW?

Question

Why in below code we push code segment (PUSH CS) and then popping it into the data segment (POP DS)?

I am giving these lines explicitly as line1 and line2. Please let me know how MOVSW is working here.

IF  HIGHMEMORY
PUSH DS
MOV BX, DS
ADD BX, 10H
MOV ES, BX
PUSH CS.           ;line1
POP DS.            ;line2
XOR SI, SI
MOV DI, SI
MOV CX, OFFSET SYSSIZE  +  1
SHR CX, 1
REP MOVSW.    ;line3
POP DS
PUSH ES
MOV AX, OFFSET SECONDRELOCATION
PUSH AX
AAA PROC FAR
RET
AAA ENDP 
SECONDRELOCATION:
more code here..............

Are the `.` characters in `push cs.` and `rep movsw.` meaningful in some assembler? It gives an error in NASM as expected: `symbol 'CS.' undefined`. I didn't edit it out, but I'm assuming it was incorrectly added as part of marking the lines with comments, not in the original source. — Peter Cordes, Dec 04 '18 at 07:19
It moves code, not data, so setting up the registers for rep mov like this is pretty normal. https://en.wikipedia.org/wiki/High_memory_area — Hans Passant, Dec 04 '18 at 07:29
@HansPassant : The code the OP is using appears to be from MSDOS 2.0 which predates HIMEM.SYS and the concept we know as the High Memory Area (HMA). Looking at the code it appears some variants of DOS 2.0 could be built that had a free memory area in lower memory (memory still under 1mb) that was not part of the memory that DOS could use to allocate for running programs. — Michael Petch, Dec 04 '18 at 10:54
Can anyone tell me please, where is bios file in github ms dos repository — Vstbutwhy, Dec 26 '18 at 01:45

Peter Cordes · Answer 1 · 2019-08-27T14:24:47.077

6

Temporarily setting DS = CS and then restoring it looks like an inefficient alternative to using a CS override prefix on rep movsw.

A segment override can change the source for movsw from DS:SI to CS:SI. (The destination of ES:DI can't be overriden).

(update: on original 8086/8088, there was a hardware "bug" / anomaly: on resuming from an interrupt that happened during a REP-string instruction, IP would point to the last prefix of an instruction, not the first. So depending on the encoding, cs rep movsw would either decode as rep movsw or cs movsw. See @MichaelPetch's comments, and https://www.pcjs.org/pubs/pc/reference/intel/8086/ for more 8086 errata and anomalies that have been fixed in later x86 CPUs.)

This code is doing a memcpy(dst, code_segment, sizeof(code_segment)), where the dst segment:offset is (BX + 16):0. The instructions before rep movsw set up DS = BX+16 and set DI=0.

Then the code jumps to the new location, using a far ret after pushing the destination segment (ES) and an offset within it. (push offset SECONDRELOCATION would work, but only on 186+. This DOS code needs to maintain backwards compat with 8086, unfortunately.)

Apparently this assembler doesn't support syntax like ret far or retf, so they have to assemble a far ret instruction by declaring a proc far around the ret instruction. AAA is a very weird name for that proc, because aaa is also a valid x86 instruction mnemonic (ASCII Adjust after Addition).

So execution continues at the SECONDRELOCATION: label in the copy of the code we just made.

(size+1) / 2 rounds up to a whole number of words, unless the size wraps in which case it copies zero bytes instead of 64k. (Unlike loop, rep checks the count before executing once.)

Doing the shr at runtime is also dumb, and could have been done at assemble time using something like mov cx, (offset endcode - startcode + 1) / 2. (You probably can't divide an offset result by 2, but you can find the distance between two labels in the same section at assemble time.)

Anyway, probably the point is to relocate the code into HIGHMEM, leaving low memory free for use by programs that can't use HIMEM.

edited Aug 27 '19 at 14:24

answered Dec 04 '18 at 02:27

Peter Cordes

328,167
45
605
847

5

Likely reason for not using `REP CS:MOVSW` is b/c of an anomaly on 8086/8088 processors. When an interrupt occurred during the operation, upon return it would continue by only using the last prefix (which is the segment prefix in this case). The result is amazingly that the `REP` prefix is actually ignored when the interrupt is finished. There were a workarounds. This kind of code would have been one mechanism (reduce it to one prefix). Another was to restart the instruction if CX was still nonzero. That required looping back and starting the instruction until CX became zero – Michael Petch Dec 04 '18 at 06:12
The other option if it fit the task was to disable interrupts and reenable them. – Michael Petch Dec 04 '18 at 06:13
1

@MichaelPetch: Wow, that's so clunky. If that wasn't just a design bug, the limitations of the low transistor budget really show through there. Interestingly, GAS assembles `cs rep movsw` to `2e f3 a5` (in `.code16`), so the last prefix in that case would is the REP, and it would copy from the wrong source if it dropped the CS override! (NASM and YASM assemble as you describe, to F3 2E A5.) – Peter Cordes Dec 04 '18 at 06:24
Correct, I didn't mention it but as you point out you rely on the order the assembler may generate the prefixes and there was no guarantee. MASM I believe would have output them in the order they appeared, but to guarantee it across different assembler (and there weren't many at the time) it was best to encode the prefixes with the `DB` directive. If you were using MASM then you were okay. – Michael Petch Dec 04 '18 at 06:27
2

Did some digging and it appears that someone has tried to document these *features* (I call them *anomalies*) polite way of avoiding the term *bug* . It isn't official but looking through them it appears they have caught many of the things we were concerned about back then: https://www.pcjs.org/pubs/pc/reference/intel/8086/ – Michael Petch Dec 04 '18 at 06:53
@MichaelPetch: thanks, for the link, good to have that collected in one place. I added it to the [x86 tag wiki](https://stackoverflow.com/tags/x86/info). – Peter Cordes Dec 04 '18 at 07:32
1

On a side note, and I forgot to mention it last night to you but I had comments to Hans about it under the question. High Memory in this context isn't what we consider it today. The OP's code snippet comes Dos 2.0 (which predates the development of HIMEM.SYS. HIGH Memory in this context is memory in the top part of usable RAM under 1mb that was not under the control of the DOS loader. it was used for DOS builds for platforms not entirely 100% PC compatible (like DEC rainbow etc) where a special OEM BIOS had to be used for compatibility. – Michael Petch Dec 04 '18 at 18:52
The source of the OPs code can be found here: https://github.com/Microsoft/MS-DOS/blob/master/v2.0/source/SYSINIT.ASM – Michael Petch Dec 04 '18 at 18:52
1

"(Inefficiently: `push offset SECONDRELOCATION` should work just fine.)" The push instructions with an immediate parameter are 186+ instructions. MS-DOS (including up to version 6.22) runs on an 8086 though. So it cannot use an 186 instruction. – ecm Aug 27 '19 at 14:17
@ecm: thanks, fixed. That makes more sense than a missed optimization. I keep forgetting about 8086 not having `push imm`, I usually only remember immediate shifts. :P – Peter Cordes Aug 27 '19 at 14:24
push imm, imul with 3 (or 2) operands, shift/rotate with non-1 immediate, enter, leave, bound, insb/insw/outsb/outsw. I think that's all of the 186 additions. I recently implemented all other than the shifts/rotates in https://github.com/ecm-pushbx/8086tiny – ecm Aug 27 '19 at 14:29

paxdiablo · Answer 2 · 2018-12-04T06:25:59.083

The sequence push cs, pop ds is simply a way to set your data segment to the same value as your code segment.

It's similar to using push ax, pop bx instead of mov bx, ax, other than the fact that it uses memory and may have a different effect on certain flags, something I couldn't be bothered checking when my intent is only to provide an example :-)

One reason you would do this dates back to the old days of x86 segmented architecture (as opposed to the more modern selectors), which is rarely used nowadays. The x86 had various memory models like, tiny, small, compact, medium, large and huge.

These were basically variations on the sizes and quantities of code and data segments that you could use and, from memory, tiny meant that you had one segment that contained both code and data.

Hence cs and ds should be set to the same value so that all instructions operated on that segment by default.

In your particular case, you're saving ds, setting it to the same value as cs then restoring it. See below for a more likely explanation of why.

As to the workings of movsw, it simply copies a single word value from the memory at ds:si to address es:di, updating the pointers afterward (increment or decrement, depending on the setting of the direction flag).

The rep prefix does that in a loop, decrementing cx until it reached zero.

Hence it's just a bulk memory copy.

Now, since the source of repsw is specified in terms of the ds segment, the real reason why you're seeing the push/pop to set ds temporarily becomes clear - it's because the source of the data obviously lies in the code segment.

push/pop and mov never touch flags. You can't `mov` directly from one segment reg to another, but they could have used CX as a temporary. push/pop saves 2 bytes (1 byte instructions vs. 2 byte mov). But this code wastes so many bytes elsewhere that IDK what the point is. — Peter Cordes, Dec 04 '18 at 02:29
Since this question is about real-mode code, there’s no reason to talk about segments in the past tense. — prl, Dec 04 '18 at 04:23
Can anyone tell me please, where is bios file in github ms dos repository. — Vstbutwhy, Dec 26 '18 at 01:59
@Vstbutwhy, if you mean the actual BIOS, that has nothing to do with MSDOS. It's to do with the hardware itself and was responsible for (eventually) loading MSDOS. It does this by loading and executing the boot sector which, if it's an MSDOS one, will then load IO.SYS. This is the first *file* loaded as part of MSDOS. — paxdiablo, Dec 26 '18 at 09:17

What's the purpose of PUSH CS / POP DS before a REP MOVSW?

2 Answers2

Linked