On 8086 (and 8088) memory access is done using two registers, a 16-bit segment register and a 16-bit offset value/register. The real address was calculated by taking segment register, shifting it 4 to the left (multiplying it with 16) and adding offset. So if you used DS
as segment register and AX
as offset register, the real memory address would be (DS * 16) + AX
.
The 8086 provided 4 registers to hold the segment value for memory access: DS (Data Segment), SS (Stack Segment), CS (Code Segment) and ES (Extra Segment). Which one would be used depended on op-code. Instruction fetch would always be relative to CS
.
Note that segments can overlapp so different segment/offset combos could reference the same memory. For instance [aaaa:0000]
, [aaa9:0010]
, [aaa8:0020]
and [aaa7:0030]
all reference the address 0xaaaa0
.
The address bus of the 8086 was 20 bit, so if the address calculation overflowed, it would just wrap around. So [ffff:0010]
would be an obfuscated way of writing
[0000:0000]
. The 80286 added wider address bus and didn't overflow when running in 8086 mode, so for the IBM PC-AT to stay compatible with the original PC, IBM decided to implement some weirdness involving the keyboard controller chip called the A20 gate. (But this is getting WAY off topic.)
So if you would want to access a memory block starting at 0xB8000
, you would set
DS
(or some other segment register) to 0xB800
, and accessing your data relative to that using an appropriate offset.
Naturally programming in C you don't have to mess with registers directly, but you still have to understand the segmented memory model of the 8086.
Typically a C compiler for the 8086 lets you compile your code using 3 different addressing modes:
Small memory model - Here all data and code is in the same segment (DS = SS = CS = ES
). Pointers are 16 bit and contain the offset in the segment. Size of code and data may only be 64k total.
Large memory model - Similar to "small memory model", but one segment for data and one for code (DS = SS = ES != CS
). Pointers are still 16 bits. Size of code and data can be max 64k each.
Huge memory model - Pointers are 32 bit and hold both segment and offset part of a memory address. Pointer arithmetic is done by modifying the offset part of a pointer only. So even if the program can access the whole 1Mb of address space, native data objects in c (including structs and arrays) can only be max 64k, and only when they start at 16-byte alignment.
The video memory on the original PC started at 0xb8000
if IRC. That would be segment 0xb800
. To be able to access that we would have to compile using "huge memory model" (32-bit pointers). In text mode, characters are stored as two bytes, one for attributes and one for character code.
Putting all that together, and assuming the video controller has been set to 80x25 alpha numeric mode, we can write to the screen like this:
struct char_cell
{
unsigned char attribute;
unsigned char char_code;
};
/*
How to handle pointers is compiler dependent, but we assume here that
the segment value is stored in the high 16 bits. So memory address
0xb8000 or [b800:0000] can be written as 0xb8000000
*/
#define video_buffer ((struct char_cell *)0xb8000000ul)
void put_char(int line, int column, int ch)
{
video_buffer[line * 80 + column].char_code = ch;
}
int get_char(int line, int column, int ch)
{
return video_buffer[line * 80 + column].char_code;
}
/*
Saving/restoring buffer doesn't handle saving/restoring cursor position.
This must be done separately.
*/
void save_video_buffer(struct char_cell buffer[80 * 25])
{
memcpy(save_buffer, video_buffer, sizeof(*buffer) * 80 * 25);
}
void restore_video_buffer(struct char_cell buffer[80 * 25])
{
memcpy(video_buffer, save_buffer, sizeof(*buffer) * 80 * 25);
}
It is long time since I did MS-DOS programming, so the details may be a little wrong.