0

I am representing a huge swath of objects (specifically MIPS32-instructions). My minimum working example will be describing an instruction in the R-format.

MIPS32 background (R-type instruction)

An R-type instruction is determined uniquely by the combination of its opcode and its funct (function) field. The opcode is the leftmost 6-bits of the instruction when represented as a 32-bit number and the rightmost 6-bits compose the funct field.

A common decomposition of an R-type instruction is into bitfields of lengths (6, 5, 5, 5, 5, 6). The bit-fields then represent the following units of

--------------------------------------------------------
| 6 bits  | 5 bits | 5 bits | 5 bits | 5 bits | 6 bits |
|:-------:|:------:|:------:|:------:|:------:|:------:|
| op      | rs     | rt     | rd     | shamt  | funct  |
--------------------------------------------------------

Hence, the unique identifier (key) for an R-type instruction is the tuple (opcode, funct) which has a one-to-one relation with an R-type instruction

Example: mul

Consider the 32-bit number 0x71014802. It is decomposed into fields of varying lengths depending on the format of the instruction.

For all numbers in the MIPS32 instruction set the leftmost six bits always represent the opcode for the instruction. The opcode alone is not always sufficient to identify the particular instruction, but it is always sufficient to identify the format of the instruction.

The leftmost six bits of 0x71014802 is 0x1c. It is known that this number corresponds to an instruction in the R-format. The format specifies into which fields the remaining bits decompose into.

As alluded to previously, all instructions may not be discerned by their opcode alone. This holds for all R-type instructions.

Decomposing 0x71014802 into the fields shown in the above table yields rs=8, rt=1, rd=9, shamt=0, and funct=2. The decomposed representation of this instruction in hexadecimal form is thus [0x1c 8 1 9 0 2]. The corresponding decimal representation is [28 8 1 9 0 2].

To identify the particular instruction represented by 0x71014802 the funct field must be consulted. Pairing the opcode, 0x1c and the value in the funct field uniquely identifies the instruction a mul instruction.


In my source code I represent the mul instruction in the following manner

/**
 * Multiply (without overflow). Put the low-order 32 bits of
 * the product of rs and rt into register rd.
 */
// TODO: Validate that shamt is 0
MUL(0x1c, 2, R::rd, R::rs, R::rt),

The method references R::rd, R::rs, R::rt are used to create a human-legible representation of the instruction by fetching the appropriate fields from the decomposed representation and by looking up register names in a table (this is of no importance to us, but it explains why it is there).

The TODO comment signifies that these objects should also satisfy 0 or more conditions to be deemed valid. As you can see we have stored a lot of information about MUL in one place, its opcode and its funct field which uniquely identifies it as well as how to produce a human-legible representation.

What remains is enclosing the validation step as well.

In Python I would use a dictionary.

{'shamt': 0, 
 other conditions
}

that I would later parse.

In Java I'd either have to have a statically initialized HashMap to represent this, or conceivably a two-dimensional array (Object[][]) could serve and then do some internal parsing and evaluation of that. The verbosity of Java would, according to me, make the intent harder to comprehend.

What I would like to express is to somehow state that when a particular function is called with a certain argument I want it to return true.

I expect all of these conditions to evaluate to true so I would be fine with evaluating them later.

So I am thinking some form of partial application, say that I have a function

shamt(int expectedValue) {
    // Check that the value of shamt matched the expected value
}

then something akin to

new Supplier<Boolean>[] {
    0x00 -> RTypeInstruction::shamt
}

which obviously does not work, might be in the right direction. It is important to somehow have a named reference here, as specifying an ordering relation between integers is not sufficient because that does not give any indication as to which bitfield has to satisfy a particular condition.

I do not want to have an array specifying a condition for each bit-field as rarely do the conditions affect more than one or two bit-fields, but it does happen.

It could be argued that the identification step (opcode, funct) are also conditions but this makes it difficult to distinguish between a failure to identify an instruction and that the instruction is simply semi-valid. What this means is that we would like to be able to identify an instruction as being the mul instruction even if the shamt is non-zero and inform the user that the input is somewhat malformed (semi-valid).

The values that the shamt method can operate on are stored internally in the enum. The reason for not specifying a plethora of methods say

boolean shamtIsZero() { ... }

because there are so many different conditions. See the later example. Is Java too ill-suited for this? Should I use a HashMap instead and evaluate that or is there some neat FunctionalInterface around that will help me do this?


Example: mtc1

There is another instruction, mtc1 identified by the opcode 0x11, and the funct field being set to 0x00.

/**
 * Move to coprocessor 0, move CPU register rt to register
 * fs in the FPU. fs occupies the rd field. Note the pattern that
 * the MTC and MTF operations share the same opcode and funct
 * field. The rs field distinguishes them.
 */
// TODO: Validate that rs = 4 and funct and shamt = 0
MTC1(0x11, 0x00, R::rt, R::fs)

as you can see here we have to satisfy several conditions and one of the bitfields has to be something other than 0. It is because of this reason we do not want to have an individual method for each condition as they would become to numerous.

Filip Allberg
  • 3,941
  • 3
  • 20
  • 37
  • 1
    It's not completely clear to me the problem you're having. Could you maybe post an example of enum you have and conditions you want to verify? – Tunaki Nov 26 '15 at 11:17
  • How are you going to use those conditions? Where do you get the value to test them on? Say you have the MUL instance, what are you going to do with the condition you have specified for `shamt`? How are you going to distinguish that it applies to `shamt` rather than something else? Can you give another instruction which has other conditions? – RealSkeptic Nov 26 '15 at 11:50
  • I have edited the post to address your questions, Tunaki and @RealSkeptic – Filip Allberg Nov 26 '15 at 11:58
  • So, do you have a `shamt` field in the enum? And you would apply the condition to that? – RealSkeptic Nov 26 '15 at 12:20
  • @RealSkeptic it is accessible in the enum with the help of another class yes. You may refer to https://github.com/leksak/dark-mips32-decompiler/blob/master/src/main/java/se/filipallberg/dark/mips32decompiler/instruction/type/RTypeInstruction/RTypeInstruction.java for an initial starting point if you'd like. It is accessible through a method as is. I am able to make it accessible through a field. – Filip Allberg Nov 26 '15 at 12:42
  • 1
    Sorry, still not clear. Suppose you had written the `shamtIsZero` method? What would it have looked like? Where would it have gotten the `shamt` value to compare? – RealSkeptic Nov 26 '15 at 14:13

1 Answers1

0

You can pass multiple conditions in the constructor - you can even make them Predicates:

enum MIPS32Instructions {

    MUL("MUL", (v) -> v > 0),
    DIV("DIV", (v) -> v > 1, (v) -> v != 9);
    final String id;
    final Predicate<Integer>[] conditions;

    MIPS32Instructions(String id, Predicate<Integer>... conditions) {
        this.id = id;
        this.conditions = conditions;
    }

    public boolean checkConditions(int v) {
        return Arrays.stream(conditions)
                .allMatch((c) -> c.test(v));
    }

}
OldCurmudgeon
  • 64,482
  • 16
  • 119
  • 213
  • 3
    If those are integers, why not `IntPredicate`s instead? – fge Nov 26 '15 at 11:41
  • This is almost what I want to do, but this is just a statement about a variable `v` that I want to satisfy a specific ordering relation with an `int`. Without having some sort of named reference my problem is not addressed. I will edit the original post to clarify this. – Filip Allberg Nov 26 '15 at 11:48