19

I have written a low-level optimization for the LLVM code-generator backend. Basically, the optimization will reorder assembly instructions at the basic block level to allow a later (existing) optimization to more efficiently optimize the resultant code. There are a number of test cases I'd like to verify, and I'd like some suggestions for the testing process, as this is the first time I've attempted something like this.

Things I've considered so far:

  1. Compile benchmarks written in C and examine the resulting ASM generated using the -S option. I have done this, and compared the results with my optimization to the original results. This method allows me to see that my optimization works, but even if I write custom non-executable C files I will not be able to examine all of my desired instruction ordering test cases.

  2. Compile benchmarks to LLVM assembly, edit that, then lower the ASM down to the target machine assembly. This may work, but because of the different level of abstraction between LLVM and target ASM, I doubt that I'd be able to examine all the test cases by hacking at the LLVM ASM until it generates what I want it to.

  3. Use the target ASM test cases as input to LLVM and recompile using the new optimization. I was unable to find an option for either LLVM or gcc (most of whose options LLVM accepts) to accept ASM as an input.

What is a good strategy for testing specific ASM test cases when validating a low-level ASM compiler optimization? Does LLVM (or gcc) have some command line options that would make this process easier?


Edit: To clarify, I'm not asking about automatically generating ASM test cases; my problem is that I have those test cases (e.g., ASM_before.s and reference_ASM_after.s) but I need to be able to pass ASM_before.s into LLVM and ensure that the optimized output ASM_after.s matches known good reference_ASM_after.s. I'm looking for a way to do this without having to "decompile" ASM_before.s into a high-level language and then compile it (with the optimization) down to ASM_after.s.

Zeke
  • 1,974
  • 18
  • 33
  • It can be used an automatic test generator. It will generate many tests (random code in some rules or even self-checking tests) and then coverage of the code generator is taken. It is not very easy to write good test generator, but it can generate more tests than a team of test writers. – osgx May 22 '11 at 02:01
  • (3) Gcc is not a asm optimizing compiler. If you will ask `gcc file.S`, it will not run the code generator, but start an `gas` assembler. – osgx May 22 '11 at 02:03

2 Answers2

6

Benchmarking is one of those slippery slopes, you can come up with a benchmark to make any language or tool look good or bad depending on what you are trying to prove.

first off I normally work on arm platforms with no operating system so it is quite simple to time the execution, sometimes down to the clock, plus or minus one to compare compilers or options.

Particularly when you get into platforms with a cache things just get worse. If you add or remove nops from your startup code, causing the whole program to change its location in memory meaning everything changes its cache alignment, without any compiler optimization changes you can sometimes find more performance differences due to the cache than differences in compiler or backend optimizations.

I normally run a dhrystone, but dont declare victory or failure with that. You might want to do a whetstone as well if you use float or a whetstone with a soft fpu.

As already mentioned by someone above, self checking tests are a good idea. Real world code too. For example compression routines, take some text (perhaps a portion of a book from project gutenburg), compress it, then decompress it and compare the output to the intput, you could add an extra validation by compressing it on a control platform like your host and hardcode the compressed size into the test if the compressed version under test does not match but it gets the right output it still fails. I have also used the jpeg library to convert images from/to jpeg, if the image is not expected to return to its original state with the lossy compression then you can just do one transfer and checksum or verify the size or carry a copy of the expected output and compare. Aes and des encryption and decryption.

There are volumes of open source projects that you can use with your modified compiler to compare it to the stock compiler or other compilers. Being real world code, it is the kind of thing your compiler will be used with anyway. Note how when you go to toms hardware or other benchmark sites there are many different benchmarks, the time it takes to render something, the time it takes to compile gcc or linux or perform a database search, a bunch of real world applications. And the various applications get various scores, very rare that one platform/solution sweeps the battery of tests.

When your performance drops as you make changes that is the time you examine the assembler and try to figure out why. Remember what Michael Abrash (and others) said, no matter how good you think your assembler is you still have to time it. Also try crazy things that you are sure are going to be slow, because sometimes you find out they are fast for reasons you never thought about.

old_timer
  • 69,149
  • 8
  • 89
  • 168
0

Is LLVM's opt command what you are looking for?

Nemo
  • 70,042
  • 10
  • 116
  • 153
  • It's close, but translation from ASM to LLVM bitcode is necessary before running `opt`, so it's not a true Target ASM -> Internal Compiler Representation -> Optimization -> Target ASM solution. – Zeke Jun 14 '11 at 18:59
  • I assumed you meant LLVM asm, not machine-specific ASM. I seriously doubt there is any way to do what you ask, since the machine-specific asm loses a lot of information (like types and visibility) that the LLVM asm provides. Don't you want to develop and test your optimization pass against the machine-independent LLVM asm anyway? – Nemo Jun 14 '11 at 20:25
  • The optimization pass runs on the machine-specific ASM (stored in LLVM as MachineBasicBlock entries) and reorders the instructions. Ideally, it would allow me to use target ASM as input, then convert it straight to MachineBasicBlock entries, run the new optimization, then convert the entries back into actual ASM. – Zeke Jun 14 '11 at 21:16
  • OK. As a last resort, try asking on the [LLVMdev mailing list](http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev). If they do not support this, maybe you can implement it and contribute it to them :-) – Nemo Jun 14 '11 at 21:24