0

What is the best way to find the first unescaped occurrence of a character in a given string?

This is how I did it, but I have a feeling it's overly complicated.

/*
 * Just like strchr, but find first -unescaped- occurrence of c in s.
 */
char *
strchr_unescaped(char *s, char c) 
{
  int i, escaped;
  char *p;

  /* Search for c until an unescaped occurrence is found or end of string is
     reached. */
  for (p=s; p=strchr(p, c); p++) {
    escaped = -1;
    /* We found a c. Backtrace from it's location to determine if it is
       escaped. */
    for (i=1; i<=p-s; i++) {
      if (*(p-i) == '\\') {
        /* Switch escaped flag every time a \ is found. */
        escaped *= -1;
        continue;
      }
      /* Stop backtracking when something other than a \ is found. */
      break;
    }
    /* If an odd number of escapes were found, c is indeed escaped. Keep 
       looking. */
    if (escaped == 1) 
      continue;
    /* We found an unescaped c! */
    return p;
  }
  return NULL;
}
Robert Kajic
  • 8,689
  • 4
  • 44
  • 43
  • Depends on the definition of best, but your solution seems like more work than necessary. Rather that using strchr and backtracking (which looks at each backslash twice), you could read forward and keep track of state (escaped/unescaped), thus only looking at each character one time. – William Pursell Mar 13 '12 at 00:18
  • I see what you mean. On the other hand, this allows me to only test for escapes when I know it is really necessary. With your solution the cost of keeping track of escapes will be paid for every examined character, regardless of wether there are escapes or not. I guess which is better depends on the nature of the tested strings, i.e. the ratio of escaped characters. – Robert Kajic Mar 13 '12 at 00:57
  • Depends on what you mean by "cost". strchr() is looking at all of those characters that your code is avoiding checking for being an escape, so its not like they aren't being tested, although you would have to check each char for both c and \ (which doesn't seem very costly, though if you used a lookup table you could check both at once). – Scott Hunter Mar 13 '12 at 01:05

1 Answers1

1

If the search character is fairly rare, your approach is reasonable. Generally, C library routines like strchr are coded in tight machine language and will run faster than almost any loop you code in C. Some models of hardware have machine instructions for searching through blocks of memory; a C library routine that uses that will run much faster than any loop you can write in C.

To tighten up your approach a little, how about this:

#define isEven(a) ((a) & 1) == 0)

char* p = strchr( s, c );
while (p != NULL) {   /* loop through all the c's */
    char* q = p;   /* scan backwards through preceding escapes */
    while (q > s && *(q-1) == '\\')
        --q;
    if (isEven( p - q ))   /* even number of esc's => c is good */
        return p;
    p = strchr( p+1, c );   /* else odd escapes => c is escaped, keep going */
}
return null;
Peter Raynham
  • 647
  • 3
  • 6