26

I've been inspired by Fabrice Bellard's implementation of an x86 virtual machine in Javascript, and I'd like to try writing the simplest possible virtual machine that is capable of running the Linux kernel. This is a purely educational endeavour, with no purpose other than understanding and sharing the code that makes this possible.

Having glanced over the x86 specification, I suspect that I might be throwing myself into the deep end by trying to write a virtual machine that is capable of emulating the complete x86 instruction set. Instead, I'm looking for a simpler architecture that I can attempt to emulate.

I've read through this question which asks how to emulate the x86 architecture, and the answer suggests starting with something simpler, like the ARM architecture. My question is more specific: what is the simplest possible architecture that I can attempt to emulate which will be able to run the Linux kernel?

I'm interested in fully emulating the entire machine, not simply passing instructions back to the host machine (which, for example, would be possible if I were writing an x86 emulator). I have a decent amount of 16-bit assembly knowledge, and some operating systems theory background, so this should be well within reach with enough work.

Community
  • 1
  • 1
Richard Keller
  • 1,980
  • 2
  • 19
  • 32
  • 7
    (+1) you would need a gcc backend for that architecture right? Since Linux is mostly C, I guess you are equally asking what is the simplest backend gcc can support. – auselen Feb 25 '13 at 20:54
  • 1
    You could emulate an Atmega micro and run this :http://www.extremetech.com/extreme/124287-the-worlds-slowest-linux-pc but that might be one level of emulation too far :) – Martin Thompson Feb 26 '13 at 10:32
  • may be also look into QEMU (http://en.wikipedia.org/wiki/QEMU) who knows you may end up creating a novel emulated architecture. – auselen Feb 26 '13 at 16:07
  • 2
    The `ARM` has a simple instruction set. However, the more difficult part to `virtualize` will be the MMU. Either you want to configure without an MMU or you could use a para-virtualized Linux. You will need to emulate many of the ARM `co-processor` registers if you use a **stock** Linux. – artless noise Feb 26 '13 at 23:26
  • besides the instruction set, there are other aspect to consider: special registers, MMU, TLBs, and generally all the stuff you DON'T see in user mode, but you can (and will) use in kernel mode – Lorenzo Dematté Mar 05 '13 at 14:01

4 Answers4

6

Simplest possible architecture will be from point-of-view of ease of implementation. Since you are building an emulator that fully emulates the machine, whichever has simplest Instruction Set Design/Architecture will be suitable. RISC architectures no doubt are better. But choosing an architecture that is not widely used is also not good, if you need support few would be able to help you. Writing a simulator is no piece of cake. I would say either go for ARM or MIPS, both are popular:

Also you must know that Fabrice Bellard's javascript virtual machine uses 32-bit x86 compatible CPU, something which is supported by Linux natively. You would have to port linux kernel (use toolchains) for ARM or MIPS yourself. See links on how to use the linux kernel

For MIPS :

For ARM :

Community
  • 1
  • 1
user568109
  • 47,225
  • 17
  • 99
  • 123
  • 1
    +1 for balance between simplicity and resources available. But besides the instruction set, there is another aspect to consider: special registers, MMU, TLBs, and generally all the stuff you DON'T see in user mode, but you can (and will) use in kernel mode – Lorenzo Dematté Mar 05 '13 at 14:00
4

The list of architectures supported by Linux kernel:

http://en.wikipedia.org/wiki/List_of_Linux_supported_architectures

The "simplest possible" is somewhat subjective, but here are what I think are less complicated ones from that list:

  • MIPS
  • H8 (μClinux)
  • 68k/Coldfire (μClinux)
Igor Skochinsky
  • 24,629
  • 2
  • 72
  • 109
  • There's RISC and there's RISC. For example, PowerPC is 'RISC' but its instruction set is far from "reduced". ARM's instruction set is also pretty complex nowadays, but MIPS is still reasonably simple. – Igor Skochinsky Feb 28 '13 at 20:37
2

As I said in the comments, I would balance three aspects:

  • simple instruction set (few instruction formats, few opcodes: anything NOT like x86)
  • documentation: widely available. This means potentially discard some simple architectures to concentrate on widely supported ones (for example, x86 wins here, but you also find a lot of material on RISC and especially MIPS from academia). Or go for something Open, like OpenRisc
  • ease of use in "kernel mode". In privileged, kernel mode there is a whole new world of registers, instructions and internals to consider. And do not forget that a processor comes with a bus too, and simple processors may have very complex buses! And you will need to emulate that as well. OR, you may go for User mode Linux, if you are happy with it.

In the end, I would suggest something "old": reasonably simple, especially in privileged mode, well studied and documented. For example, the original MIPS, the Motorola 68k family, or something close to the original RISC (http://en.wikipedia.org/wiki/Berkeley_RISC), if there is a Linux variant for it!

Lorenzo Dematté
  • 7,638
  • 3
  • 37
  • 77
  • Why would the bus need to be emulated? Surely that's transparent to the kernel? – Richard Keller Mar 06 '13 at 11:36
  • @RichardKeller some parts will be, some will not. The kernel also needs to deal with I/O which means, for example, to interact with the programmable interrupt controller, with direct I/O and/or DMA... – Lorenzo Dematté Mar 06 '13 at 12:31
1

You might look at the microBlaze, a processor designed for efficient implementation on FPGA's. It has only two instruction formats and 32 primary opcode values.

It is defined and supported by Xilinx for its line of FPGA's the reference document is at : http://www.xilinx.com/support/documentation/sw_manuals/mb_ref_guide.pdf

HBP
  • 15,685
  • 6
  • 28
  • 34