-2

I am writing a program in C to convert a hex file compiled for the LC3 processor back into assembly language.

Currently, I am trying to decode the ADD instruction.

There are two types of ADDs in LC3 assembly language:

  • add by reference: adding two registers
  • add immediate: adding a register to a hard-coded value

For example, the hex code 164F would be converted to: ADD R3, R1, R7. this is an add by reference. Conversely, the hex code for 153F would be converted to: ADD R2, R4, #-1. This is an add immediate.

The function should decode both as appropriate.

Matt
  • 61
  • 1
  • 1
  • 7
  • Writing a [decompiler](http://en.wikipedia.org/wiki/Decompiler) as a beginner programmer is not going to be an easy task, in fact it's going to be *very* hard. And how to do it also depends very much on the platform you're decompiling for, as some architectures have variable-sized instructions. – Some programmer dude Apr 16 '15 at 00:14
  • What you should first do is learn about the I/O functions and do some testing with that to learn it. Having a [good reference](http://en.cppreference.com/w/c) will help here. Getting to know the language, the library and the platform you're programming on is also good before you try to write any kind of large or complex program. Then for your actual assignment, you should read the manuals and documentation of the CPU architecture ***thoroughly***, they will tell you what every bit in the instructions mean. – Some programmer dude Apr 16 '15 at 00:16
  • The microarchitecture that we are using is the LC3 and all instructions have are set to 16 bits. Thank you for the quick response. Please let me know if there is any more necessary information that I have left out. – Matt Apr 16 '15 at 00:17
  • Also, I know what all the instructions mean. I have been studying the microarchitecture for about a month now and have played around with hex and assembly. My issue is not with the architecture but with I/O in C. – Matt Apr 16 '15 at 00:20
  • Use `fopen` to open the file, `fread` to read the file into memory (say, into an array of `uint16_t`), and use a loop to inspect each instruction. Not much to it. – nobody Apr 16 '15 at 00:32
  • @JoachimPileborg: A disassembler, which is what the OP is describing, would be a lot easier than a decompiler. – Keith Thompson Apr 16 '15 at 00:49
  • @KeithThompson True. I think I need a coffee (except I don't drink coffee... ;)) – Some programmer dude Apr 16 '15 at 00:59

1 Answers1

4

I was wondering if I could just have some help with this first function: void printAdd(int instruction);

Well, the functions themselves are already assuming you have detected the opcode and dispatched accordingly, so we don't need to deal with that part here. As you say, there are two forms of the ADD instruction. An instruction set reference shows their structure as:

ADD DR, SR1, SR2 = 0001 DR SR1 0 0 0 SR2
ADD DR, SR, IMM5 = 0001 DR SR  1 IMM5

What this means is bit #5 differentiates between the two versions. You will need to branch on that bit. Other than that, it's just some bit twiddling to extract the numbers, and plain simple printf that I hope you know how to use. Something like:

void printAdd(int instruction)
{
    printf("ADD R%d, R%d, ", (instruction >> 9) & 7, (instruction >> 6) & 7);
    if (instruction & 0x20)
    {
        printf("#%d\n", instruction & 0x1F);
    } else {
        printf("R%d\n", instruction & 7);
    }
}

Sign-extending the immediate is left as an excercise ;)

Jester
  • 56,577
  • 4
  • 81
  • 125
  • Extension to my incomplete response: Great response, thank you. I have one follow up question regarding the binary shifts you are performing. For example's sake lets say we have the add instruction: 0001 001 010 000 001. Now I know there must be something wrong with how I am interpreting this so please correct me where I am wrong: The first formatted decimal performs a binary right shift of 9 on the instruction, giving us: 0001 001 & 0000 111 (7). Would this not return zero? How does it receive the correct register from this? Again, thank you for your response! – Matt Apr 16 '15 at 01:07
  • `0001 001 & 0000 111` is not zero, it's `0000 001`. Masking with 7 means, keep the low 3 bits. Of course that may give you zero, and that is encoding a valid register `R0`. But in this example that's `R1`. – Jester Apr 16 '15 at 01:11
  • Thank you Jester, cleared it up for me :) – Matt Apr 16 '15 at 04:52