1

When using std::codecvt's in method to decode an external byte sequence to an internal char sequence, is there a situation where the destination buffer of internal chars needs space for more than one internal char?

Here is some code for reference:

// const std::locale& loc;
// mbstate_t state;
// const char *extern_buf_ptr;
// const char *extern_buf_eptr;
const std::codecvt<wchar_t, char, mbstate_t> *pcodecvt = &std::use_facet<std::codecvt<wchar_t, char, mbstate_t> >(loc);

wchar_t intern_char;
wchar_t *tmp;
std::codecvt_base::result in_res = pcodecvt->in(state,
        extern_buf_ptr, extern_buf_eptr, extern_buf_ptr,
        &intern_char, &intern_char + 1, tmp);

This is a simplification of some template code that I have written to decode bytes read individually from a Winsock SOCKET, where the user desires "unbuffered" input. Basically, with each iteration of a loop, a byte is read into the external buffer. The loop terminates when in_res is not std::codecvt_base::partial.

What I am wondering is: Is there a scenario where a call to in() would require space in the destination buffer for more than one internal character? I.e., is there a scenario that would make the above-described loop an infinite loop?

Daniel Trebbien
  • 38,421
  • 18
  • 121
  • 193
  • Note that `wchar_t` is defined to be wide enough to hold any character value of "the system's character set". So it is reasonable to assume that any `wchar_t` string sequence is required to be processable one by one. – Kerrek SB Nov 22 '11 at 23:09

1 Answers1

3

There's a note in §22.4.1.4.2/3 to that extent:

basic_filebuf assumes that the mappings from internal to external characters is 1 to N: a codecvt facet that is used by basic_filebuf must be able to translate characters one internal character at a time

Sounds like any locale that's good for IO streams is good for your use as well.

Cubbi
  • 46,567
  • 13
  • 103
  • 169