0
static const int class[UCHAR_MAX] =

{ [(unsigned char)'a'] = LOWER, /*macro value classifying the characters*/
  [(unsigned char)'b'] = LOWER,
.
.
.
}

This is just an idea. Is it a bad one?

  • What are you trying to do, speed up character classification? – Nikolai Fetissov Feb 09 '10 at 20:42
  • Define bad (time, space, readability/usability). – 3lectrologos Feb 09 '10 at 20:44
  • I've seen code just like this in the MSVC runtime, it's used for islower, isupper, isalpha, etc. – John Knoeller Feb 09 '10 at 20:45
  • @NIKOLAI: I need to index another array by these character classes and I want to get the classification as fast as I can. @3lectologos: I would guess bad would be that you see an immediate way that would be better, by your own freely choosable metric. – Questionable Feb 09 '10 at 20:48
  • 1
    I kind of doubt that wasting 1K for classifying 256 chars, out of which you probably hit only about 50, is going to be better then couple of comparisons, but only measurement can tell. – Nikolai Fetissov Feb 09 '10 at 20:59

4 Answers4

4

Designated initializers are in C99, not C89. They also exist as a GCC extension for C89, but will not be portable.

Other than that, the use of lookup tables is a common way to handle classification of a small number of objects quickly.

Edit: One correction though: The size of the array should be UCHAR_MAX+1

interjay
  • 107,303
  • 21
  • 270
  • 254
2

BTW, GCC's designated initializer extensions allow for

static const int class[] = {
    [0 ... UCHAR_MAX] = UNCLASSIFIED,
    [(unsigned)'0' ... (unsigned)'9'] = DIGIT,
    [(unsigned)'A' ... (unsigned)'Z'] = UPPER,
    [(unsigned)'a' ... (unsigned)'z'] = LOWER,
 };

initializers applying to ranges of indices, with later initializations overriding earlier ones.

Very non-standard, though; this isn't in C89/C90 nor C99.

ephemient
  • 198,619
  • 38
  • 280
  • 391
1

Unfortunately, that is not portable in C89/90.

$ gcc -std=c89 -pedantic test.c -o test
test.c:4: warning: ISO C90 forbids specifying subobject to initialize
test.c:5: warning: ISO C90 forbids specifying subobject to initialize
mbauman
  • 30,958
  • 4
  • 88
  • 123
0

Aside from using int rather than unsigned char for the type (and thereby wasting 768 bytes), I consider this a very good idea/implementation. Keep in mind that it depends on C99 features, so it won't work with old C89/C90 compilers.

On the other hand, simple conditionals should be the same speed and much smaller in code size, but they can only represent certain natural classes efficiently.

#define is_ascii_letter(x) (((unsigned)(x)|32)-97<26)
#define is_digit(x) ((unsigned)(x)-'0'<10)

etc.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711