How to test my generated assembly program?

Question

I've made a program which generate assembly instructions according to arguments for my vector extension to perform convolution. Note that I assume My vector extension doesn't have a loop or branch instruction

However, if I set input width = 7, kernel width = 3, Input channel = 128, Output channel = 4, then the number of generated instructions is almost 90,000. I have an instruction simulator for this vector processor but I can't figure it out how to check my generated instructions are sane or not.

Is there any good point to start or any good idea?

@PeterCordes I totally agree with you. But it is quite hard for me to find where to start debug when the data is not matched with the 'good implementation'. Do you have any good or convinient idea? — laurent01, Jul 21 '20 at 01:32
Ah, debugging is a different question from pass/fail testing which your question was asking about. Added a section about that in my answer, and updated your question tags to include debugging. — Peter Cordes, Jul 21 '20 at 01:40

Peter Cordes · Accepted Answer · 2020-07-21T01:39:54.430

The obvious thing would be to run it with some fully randomized test inputs, and compare against the result of a simple known-good implementation with the same data input. (e.g. written in C or your favourite high-level language, possibly just running on the host CPU, not inside the simulator). A simple implementation running inside your simulator would be good to have as well, or instead if that's easier.

When you compare results, you may need to allow some wiggle room for FP rounding errors if your simple implementation uses a different order of operations. Like a pretty standard thing would be to check that the absolute differences are all within 1e-7 or something, or check relative differences (although relative-error can be large for numbers near zero that resulted from subtraction; catastrophic cancellation is a known problem for FP).

(See also https://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/ and the rest of Bruce's series of FP articles if you're not already aware of these issues.)

Perhaps worth having a reference implementation that computes in double-precision so you have a better idea what the actual correct answers are, when evaluating a computation with rounding errors.

Debugging when data doesn't match the reference:

Test again with very simple input data, like all 0.0 except a 1.0 in one element. That might highlight a wrong array indexing problem. Or all 1.0, or all -2.0.

Or some input that should produce a very simple output, for the known algorithm you're trying to implement. e.g. if most outputs are supposed to be 0.0, seeing which ones aren't, or what value they have, could be a big hint.

Also note that most real-world CPUs have some kind of instruction cache, so it's usually worth a tiny bit of loop overhead (large unrolled loop) to recycle a loop body that fits in cache, instead of fully unrolling / peeling a loop into a huge block of straight-line code. (Like 90k instructions sounds like too much). But if there really isn't any simple repetition that can be amortized via unrolling, it's worth considering this.

How to test my generated assembly program?

1 Answers1

Debugging when data doesn't match the reference: