Retargeting gcc/llvm for a new Harvard architecture RISC

Question

I was wondering if someone with experience with compilers could tell me if my idea is workable or not.

Basically, I am going to propose a project idea to one of my professors and take a "project course" – I won't detail the process here much, just saying that its an undergrad project course where I get to propose the idea.

I am going to make a new RISC ISA (similar to MIPS but without delay slots, no floating point ...) and write a software emulator for it, as this is to be done entirely in software.

But I am going to make it a Harvard architecture CPU – meaning, data can not be executed; code and data are kept separate.

Making the ISA and an emulator for it is relatively easy, I don't expect to run into anything that I am not already familiar with.

Then I want to make a gcc or llvm backend for my new ISA so that C programs can be compiled for my new ISA.

Now I have never written a compiler. Given that my ISA is mostly modelled after MIPS, I can just model the backend after the MIPS (or some other RISC) backend.

The question I have is about the Harvard architecture part. Should I expect to run into major problems here? How would this complicate the code generation part?

In the end, I will write a report on my emulated Harvard CPU and analyse some security aspects of it (i.e. not allowing data execution could prevent buffer overflow attacks etc., etc. ...).

In case I abused the term "Harvard architecture" here, let me clarify what I mean. I am not talking about internal caches or anything like that. I am just referring to having the CPU keep code and data separate so that data can not be executed.

Maybe it could be helpful to have a look at SafeStack (http://clang.llvm.org/docs/SafeStack.html) since its somehow related with what you want to achieve. — box, Dec 18 '15 at 16:44
You don't have to do anything on a backend level, not until you have some additional requirements like storing code pointers in special registers only. In a frontend, you can enforce that function pointers are in a different address space, but it is also not necessary. And I recommend reading this before building a new LLVM backend: http://eli.thegreenplace.net/2012/11/24/life-of-an-instruction-in-llvm — SK-logic, Dec 18 '15 at 23:13

Retargeting gcc/llvm for a new Harvard architecture RISC

0 Answers0