I misunderstand win32 (and maybe libc) strtok( )

Question

In some CGI code, I need to encode rarely-occurring '&', '<', and '>' chars. In the encoding function, I want to get out right away if there are no such chars in the input string. So, at entry, I try to use strtok( ) to find that out:

char *
encode_amp_lt_gt ( char *in ) {
  ...
  if ( NULL == strtok( in, "&<>" )) {
    return in;
  }
  ...
}

But, even in the absence of any of the delimiters, strtok( ) returns a pointer to the first character of in.

I expected it to return NULL if no delims in the string.

Is my code wrong, or is my expectation wrong? I don't want to call strchr( ) three times just to eliminate the usual case.

Thanks!

Unless you deal with very, very large text, `strchr` three times ain't gonna kill your performance. — zneak, Jun 29 '11 at 04:38
@zneak has a point on premature optimization, but I'll take it a bit further and ask if you've considered doing this in a higher-level language than C. I know PHP has a library function for encoding special characters and I'd be shocked if it wasn't easy in Perl, Python, and Ruby as well. — Jordan Wade, Jun 29 '11 at 04:54
If you are afraid of repeating `strchr`s there is also `strpbrk`. — asveikau, Jun 29 '11 at 05:53

zneak · Answer 1 · 2011-06-29T19:55:16.580

You probably don't want strtok to begin with, as it leaves you no way of figuring what character was eliminated (except if you have a spare copy of the string).

strtok is not a straightforward API and is easy to misunderstand.

Quoting the manpage:

 The strtok() and strtok_r() functions return a pointer to the beginning of
 each subsequent token in the string, after replacing the token itself with
 a NUL character.  When no more tokens remain, a null pointer is returned.

Your problem probably means you've fallen to the obscurity of the algorithm. Suppose this string:

char* value = "foo < bar & baz > frob";

The first time you call strtok:

char* ptr = strtok(value, "<>&");

strtok will return you the value pointer, except that it will have modified the string to this:

"foo \0 bar & baz > frob"

As you may notice, it changed the < to a NUL. Now, however, if you use value, you'll get "foo " since there's a NUL in the middle of the way.

Subsequent calls to strtok with NULL will proceed through the string, until you've reached the end of the string, at which point you'll get NULL.

char* str = "foo < bar & frob > nicate";
printf("%s\n", strtok(str, "<>&")); // prints "foo "
printf("%s\n", strtok(NULL, "<>&")); // prints " bar "
printf("%s\n", strtok(NULL, "<>&")); // prints " frob "
printf("%s\n", strtok(NULL, "<>&")); // prints " nicate"
assert(strtok(NULL, "<>&") == NULL); // should be true

It would be fairly straightforward to write a function that replaces the contents without strtok, either dealing with the hard work yourself, or getting help from strpbrk and strcat.

Thanks! Yeah, it's not well explained on the man page. So now I know that strtok( ) will always return non-NULL for a non-empty string. — Pete Wilson, Jun 29 '11 at 05:03

score 3 · Accepted Answer · answered Jun 29 '11 at 04:43

3

The function you want is strpbrk, not strtok. The bigger question is - how is the string that is being returned being allocated when you're replacing things, and how does the calling function know if it should free it or not?

answered Jun 29 '11 at 04:43

Random832

37,415
3
44
63

The expanded, returned string is in a static char[ ] array that's big enough, so I'm ok there, but THANKS for mentioning it. Please always do point out this potential ass-biting (as I know from painful experience) error when you have a chance. Thanks also for mentioning strpbrk( ), I never use it but now I will. – Pete Wilson Jun 29 '11 at 04:58
It took me a while to grok how deep your question is. For this app, the answer is: if the ptr returned to the caller is diff from the ptr the caller passed in, then that ptr has to be free( )ed. – Pete Wilson Jun 29 '11 at 06:36

I misunderstand win32 (and maybe libc) strtok( )

2 Answers2