0

So, I'm designing my own virtual CPU. I have some registers and memory and can execute some rudimentary instructions. But, now I'm stuck.

How can I differentiate (in my assembled "machine code") between:

LDA $02 ; Load the hex value 0x01 into register A

and

LDA B   ; Load the value of B into A

Right now I have encoded the operand ($02 and B) both as a value of 0x02. The instruction LDA is encoded as a single Word (uint16 at this point).

This will obviously give problems. What is the best way to work around this? I think I have the following options:

  1. Somehow encode into the 16 bits of the instruction that we're dealing with a value or a register (or later a memory location)
  2. Create different instructions for different operands. E.g. LOADIA, LOADRA, LOADMA for literals, registers and memory respectively.

IMHO option 1 would be best. Can you confirm 1 is a valid option or provide other methods of handling this problem. Thanks!

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Ariejan
  • 10,910
  • 6
  • 43
  • 40
  • 7
    If `B` is a register then those two instructions are using different addressing modes (immediate vs register) and should probably translate into two separate opcodes. – Michael Jul 01 '14 at 07:06
  • 2
    consider that there is (may be) a difference between the mnemonics you're using when writing source code, and the instruction patterns, as executed by your processor, the result of translation of your source code. The mnemonics may be identical (whith operands differing), while the resulting opcodes can be different. And in fact, the two options you see are essentially the same: whether you mark operand type with some bits, or create different opcode, either will yield a different pattern of instruction. Think of "different opcode is instruction with some bits changed" – Deleted User Jul 01 '14 at 08:20
  • @Bushmills - thanks, this makes perfect sense. I cannot accept this as an answer though. – Ariejan Jul 01 '14 at 09:26
  • 2
    Doesn't matter. Relevant is that you're helped, not that some virtual currency is exchanged. – Deleted User Jul 01 '14 at 09:34

3 Answers3

3

First convert the mnemonic to long hand.

lda $02 becomes ld a $02 becomes load immediate a with 02

lda b becomes ld a b becomes copy register a from b

As can be seen lda is not necessarily the instruction, you need to look at the whole lot. Also if you keep all instructions the same length, you can get better performance (with real processors).

Have a look at the ARM processor, it is well documented, clean and still used (a lot). http://simplemachines.it/doc/arm_inst.pdf

ctrl-alt-delor
  • 7,506
  • 5
  • 40
  • 52
  • Thanks! I have implemented it like this. I call the instruction 'MOV', but I have different instruction codes (0x01, 0x02, 0x03 etc) for register to register or immediate to register etc. – Ariejan Jul 02 '14 at 09:25
1

X86 CPUs are using so called Mod R/M-bytes or Postbytes.

Instruction Prefix                0 or 1 Byte
Address-Size Prefix               0 or 1 Byte
Operand-Size Prefix               0 or 1 Byte
Segment Prefix                    0 or 1 Byte
Opcode                            1 or 2 Byte
Mod R/M                           0 or 1 Byte
SIB, Scale Index Base (386+)      0 or 1 Byte
Displacement                      0, 1, 2 or 4 Byte (4 only 386+)
Immediate                         0, 1, 2 or 4 Byte (4 only 386+)

Format of Postbyte(Mod R/M from Intel)
------------------------------------------
MM RRR MMM

MM  - Memory addressing mode
RRR - Register operand address
MMM - Memory operand address

RRR Register Names
Filds  8bit  16bit  32bit
000    AL     AX     EAX
001    CL     CX     ECX
010    DL     DX     EDX
011    Bl     BX     EBX
100    AH     SP     ESP
101    CH     BP     EBP
110    DH     SI     ESI
111    BH     DI     EDI

---

16bit memory (No 32 bit memory address prefix)
MMM   Default MM Field
Field Sreg     00        01          10             11=MMM is reg
000   DS       [BX+SI]   [BX+SI+o8]  [BX+SI+o16]
001   DS       [BX+DI]   [BX+DI+o8]  [BX+DI+o16]
010   SS       [BP+SI]   [BP+SI+o8]  [BP+SI+o16]
011   SS       [BP+DI]   [BP+DI+o8]  [BP+DI+o16]
100   DS       [SI]      [SI+o8]     [SI+o16]
101   DS       [DI]      [DI+o8]     [SI+o16]
110   SS       [o16]     [BP+o8]     [BP+o16]
111   DS       [BX]      [BX+o8]     [BX+o16]
Note: MMM=110,MM=0 Default Sreg is DS !!!!

32bit memory (Has 67h 32 bit memory address prefix)
MMM   Default MM Field
Field Sreg     00        01          10             11=MMM is reg
000   DS       [EAX]     [EAX+o8]    [EAX+o32]
001   DS       [ECX]     [ECX+o8]    [ECX+o32]
010   DS       [EDX]     [EDX+o8]    [EDX+o32]
011   DS       [EBX]     [EBX+o8]    [EBX+o32]
100   SIB      [SIB]     [SIB+o8]    [SIB+o32]
101   SS       [o32]     [EBP+o8]    [EBP+o32]
110   DS       [ESI]     [ESI+o8]    [ESI+o32]
111   DS       [EDI]     [EDI+o8]    [EDI+o32]
Note: MMM=110,MM=0 Default Sreg is DS !!!!

---

SIB is (Scale/Base/Index)
SS BBB III
Note: SIB address calculated as:
<sib address>=<Base>+<Index>*(2^(Scale))

Fild   Default Base
BBB    Sreg    Register   Note
000    DS      EAX
001    DS      ECX
010    DS      EDX
011    DS      EBX
100    SS      ESP
101    DS      o32        if MM=00 (Postbyte)
SS      EBP        if MM<>00 (Postbyte)
110    SS      ESI
111    DS      EDI

Fild  Index
III   register   Note
000   EAX
001   ECX
010   EDX
011   EBX
100              never Index SS can be 00
101   EBP
110   ESI
111   EDI

Fild Scale coefficient
SS   =2^(SS)
00   1
01   2
10   4
11   8
  • 1
    That's a good explanation but modRM bits are designed for space efficiency, and are pretty hard to follow. That's probably an overkill for him. – Leeor Jul 01 '14 at 11:44
  • Having opcodes that vary between 1 and 16 bytes will kill the pipeline. On the x86 some of these extension bytes are addons, no make a 8 bit processor 16 bit or is that 32 bit no 64bit. Others are a miss guided attempt to keep the average instruction length down. – ctrl-alt-delor Jul 01 '14 at 13:15
  • Thanks for this. Although interesting I don't think this is the answer to my question. – Ariejan Jul 02 '14 at 09:24
1

As You already got some replays, I'll just try to simplify the answers.

You may define your cpu commands in a specific fashion. For example your LDA command may be defined as: 1010101x (binary format) where 1010101 means LDA and the last bit specifies if the next byte is immediate value (0) or register (1).

So in your case it would be:

LDA $02  = 10101010 00000010 
LDA B    = 10101011 00000010

It is just an example but this is how all processors I know work. For some commands you may use more xxx bits. You may also have zero of them (example of NOP instruction).

Michał Walenciak
  • 4,257
  • 4
  • 33
  • 61