What is the best way to convert wide string to base64?
Asked
Active
Viewed 6,795 times
3 Answers
6
Octet (8 bit symbols) -> Base64 (6 bit symbols) conversion works on bytes, not characters, so it works the same way independent of your string encoding.
To be clear: Base64 is not a character encoding. Sender and receiver need to agree on the character encoding (ASCII, UTF-8, UTF-16, UCS-2, etc) as well as the transport method (Base64, gzip, etc).

Ben Voigt
- 277,958
- 43
- 419
- 720
-
1To clarify, since a `wchar_t` is not an octet, you have to convert wide strings to arrays of octets before base64 encoding. – Dietrich Epp May 23 '11 at 14:13
-
To clarify further, you need to decide whether to convert your wide string to an intermediate form like UTF-8 before encoding to Base64, or to just skip that, typecast the `wchar_t*` to `const char*` and encode to Base64. – Mike DeSimone May 23 '11 at 14:18
-
@Dietrich, @Mike: A well-designed conversion API would be taking a `void*`. The size of the chunks the API processes at a time is completely an implementation detail, and it might very well use 16- or 32-bit words internally. `void*` is the right type for binary data (see also `fread`, `memcpy`). – Ben Voigt May 23 '11 at 14:24
-
@Ben Voigt: But a `wchar_t` is not portable, so if you reverse the encoding on a different platform, you will get a mangled string. Some platforms have 16-bit, others have 32-bit `wchar_t`. Some are big or little endian. Since `wchar_t` is not byte oriented, it should not be base64 encoded, whether or not the API generates a type error. – Dietrich Epp May 23 '11 at 14:35
-
@Dietrich: That's true, but no one but you is talking about `wchar_t`. Regardless, Base64 preserves the encoding whatever it is. Your comments do apply to Kirill's answer though. But I've added to my answer to clarify this. – Ben Voigt May 23 '11 at 15:50
-
@Ben Voigt: The question title is "wide string", or does that mean something else besides a string of `wchar_t`...? – Dietrich Epp May 23 '11 at 22:29
-
@Dietrich: It could mean that, or `char16_t`, and `char32_t`, or any other string with characters outside the ASCII range. If we're being generous, even UTF-8 could qualify. – Ben Voigt May 23 '11 at 23:44
-
@Ben: "Wide string" is defined as a sequence of wide characters in the relevant standard. "Wide character" is defined as `wchar_t`. The terms "multibyte character" or "multibyte string" are used when speaking of `char16_t` and `char32_t`, strings of which may be *initialized with* wide string literals. I have never heard the term "wide string" outside the C/C++ community, so I use the definition from the C/C++ standards. – Dietrich Epp May 24 '11 at 03:36
1
To encode some data to base64 you can use Base64 class from the Xerces library. It could look like the following:
std::wstring input_string = SOME; // some wide string
// keep it in contiguous memory (the following string is not needed in C++0x)
std::vector<wchar_t> raw_str( input_string.begin(), input_string.end() );
XMLSize_t len;
XMLByte* data_encoded = xercesc::Base64::encode( reinterpret_cast<const XMLByte*>(&raw_str[0]), raw_str.size()*sizeof(wchar_t), &len );
XMLCh* text_encoded = xercesc::XMLString::transcode( reinterpret_cast<char*>(data_encoded) );
// here's text_encoded is encoded text
// do some with text_encoded
XMLString::release( &text_encoded );
XMLString::release( reinterpret_cast<char**>(&data_encoded) );

Kirill V. Lyadvinsky
- 97,037
- 24
- 136
- 212
-
This is a solution where no intermediate form such as UTF-8 is used. – Mike DeSimone May 23 '11 at 14:20
-
Very useful code showing how to release the memory allocated by xercesc – Damian Nov 17 '17 at 01:15
0
If you are using Visual C++ with MFC, there is already a library to do this. Check out Base64Encode
and Base64Decode
.

Jonathan Wood
- 65,341
- 71
- 269
- 466