3

What is the best way to convert wide string to base64?

user694655
  • 287
  • 4
  • 11

3 Answers3

6

Octet (8 bit symbols) -> Base64 (6 bit symbols) conversion works on bytes, not characters, so it works the same way independent of your string encoding.


To be clear: Base64 is not a character encoding. Sender and receiver need to agree on the character encoding (ASCII, UTF-8, UTF-16, UCS-2, etc) as well as the transport method (Base64, gzip, etc).

Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
  • 1
    To clarify, since a `wchar_t` is not an octet, you have to convert wide strings to arrays of octets before base64 encoding. – Dietrich Epp May 23 '11 at 14:13
  • To clarify further, you need to decide whether to convert your wide string to an intermediate form like UTF-8 before encoding to Base64, or to just skip that, typecast the `wchar_t*` to `const char*` and encode to Base64. – Mike DeSimone May 23 '11 at 14:18
  • @Dietrich, @Mike: A well-designed conversion API would be taking a `void*`. The size of the chunks the API processes at a time is completely an implementation detail, and it might very well use 16- or 32-bit words internally. `void*` is the right type for binary data (see also `fread`, `memcpy`). – Ben Voigt May 23 '11 at 14:24
  • @Ben Voigt: But a `wchar_t` is not portable, so if you reverse the encoding on a different platform, you will get a mangled string. Some platforms have 16-bit, others have 32-bit `wchar_t`. Some are big or little endian. Since `wchar_t` is not byte oriented, it should not be base64 encoded, whether or not the API generates a type error. – Dietrich Epp May 23 '11 at 14:35
  • @Dietrich: That's true, but no one but you is talking about `wchar_t`. Regardless, Base64 preserves the encoding whatever it is. Your comments do apply to Kirill's answer though. But I've added to my answer to clarify this. – Ben Voigt May 23 '11 at 15:50
  • @Ben Voigt: The question title is "wide string", or does that mean something else besides a string of `wchar_t`...? – Dietrich Epp May 23 '11 at 22:29
  • @Dietrich: It could mean that, or `char16_t`, and `char32_t`, or any other string with characters outside the ASCII range. If we're being generous, even UTF-8 could qualify. – Ben Voigt May 23 '11 at 23:44
  • @Ben: "Wide string" is defined as a sequence of wide characters in the relevant standard. "Wide character" is defined as `wchar_t`. The terms "multibyte character" or "multibyte string" are used when speaking of `char16_t` and `char32_t`, strings of which may be *initialized with* wide string literals. I have never heard the term "wide string" outside the C/C++ community, so I use the definition from the C/C++ standards. – Dietrich Epp May 24 '11 at 03:36
1

To encode some data to base64 you can use Base64 class from the Xerces library. It could look like the following:

std::wstring input_string = SOME; // some wide string
// keep it in contiguous memory (the following string is not needed in C++0x)
std::vector<wchar_t> raw_str( input_string.begin(), input_string.end() );

XMLSize_t len;
XMLByte* data_encoded = xercesc::Base64::encode( reinterpret_cast<const XMLByte*>(&raw_str[0]), raw_str.size()*sizeof(wchar_t), &len );
XMLCh* text_encoded = xercesc::XMLString::transcode( reinterpret_cast<char*>(data_encoded) );

// here's text_encoded is encoded text
// do some with text_encoded

XMLString::release( &text_encoded );
XMLString::release( reinterpret_cast<char**>(&data_encoded) );
Kirill V. Lyadvinsky
  • 97,037
  • 24
  • 136
  • 212
0

If you are using Visual C++ with MFC, there is already a library to do this. Check out Base64Encode and Base64Decode.

Jonathan Wood
  • 65,341
  • 71
  • 269
  • 466