2

I would like to broaden my knowledge and skills in compiler writing, especially optimizations. I would like to know what optimizations are available for case-statements with case expression of string type. For instance in Object Pascal:

ReadLn(s);
case s of
  'abc','def': ...;
  'xyz'      : ...;
  otherwise    ...;
end;

in Free Pascal this is translated into subsequent calls of AnsiCompareText. What about other language implementations? I know at least PHP, Nimrod and Octave support this.

osgx
  • 90,338
  • 53
  • 357
  • 513
LeleDumbo
  • 9,192
  • 4
  • 24
  • 38

2 Answers2

0

In C there is no "case" equivalent for char arrays (strings) but it can be accomplished to some extent using a bit shifting macro and switch case

#define FIVE_CHARS(c1,c2,c3,c4,c5)  (((((((((c5)<<7)|(c4))<<6)|(c3))<<6)|(c2))<<6)|(c1))

while (argc-->0){
  switch ( FIVE_CHARS(argv[argc][0],argv[argc][1],argv[argc][2],argv[argc][3],argv[argc][4]) ){
     case FIVE_CHARS('-','h','e','l','p')  :
     case FIVE_CHARS('-','-','h','e','l')   :
     case FIVE_CHARS('-','h','\0','\0','\0')   :
     case FIVE_CHARS('-','?','\0','\0','\0')   :
       usage();
     break;
     case FIVE_CHARS('-','a','r','g','1')   :
       setflag1();
     break;
     default:
       assert("Argument not supported");
  }
}

The compiler may compile this as a series of if's with a small number of comparisons or a jump table with a large number. This can provide significant improvement in both code size and speed since most of the bit shifts (those in the case statements) are done at compile time rather than run time, the remaining bit shift operation (the one in the switch) is relatively cheap and only needed once for a single comparison (essentially negating any need to put the most common paths first) ... for cases with matching five characters you can add an extra switch case for an uncommon character/characters or just use a strcmp() ... its still better to only need strcmp for a few cases though, rather than a huge nested tree of if strcmp() {} else if strcmp() {} else ...

technosaurus
  • 7,676
  • 1
  • 30
  • 52
0

As an application developer, I would want to put the cases that are most likely to be executed first so as to limit the number of comparisons. Unfortunately, from the viewpoint of a compiler, you wont know that until runtime.

If I were writing my own compiler and encountered a case statement like the above, I would probably try and sort the comparisons and do a binary search to determine which path to take. This would hopefully improve the worst case scenario a bit.

Sparafusile
  • 4,696
  • 7
  • 34
  • 57
  • I'm actually thinking about integrating suffix tree 'into' the resulting code. By that, I mean no real suffix tree is constructed, but the instruction sequence works like a search in a suffix tree. This optimization is very optimal (perhaps more with dead path elimination) and only costs O(n) where n is the length of the string in the variable – LeleDumbo Feb 05 '11 at 17:10
  • @LeleDumbo That's actually what I was thinking, but couldn't find the words. There's no reason you can't make a very efficient search algorithm considering you know the entire dataset at runtime. – Sparafusile Feb 06 '11 at 00:51