What is this format of text encoding ? Are there any build in methods to deal with it in c++?

Question

Deal, I mean to for example to convert to a proper std::wstring with all letters in human readable format.

Example:

Al elmaya ta\305\237 atan \303\247ok olur.
Al elmaya taş atan çok olur.


Za\305\274\303\263\305\202\304\207 g\304\231\305\233l\304\205 ja\305\272\305\204.
Zażółć gęślą jaźń.

\320\221\320\265\320\264\320\260 \320\275\320\270\320\272\320\276\320\263\320\264\320\260 \320\275\320\265 \320\277\321\200\320\270\321\205\320\276\320\264\320\270\321\202 \320\276\320\264\320\275\320\260.
Беда никогда не приходит одна.

In this case: `\nnn - arbitrary octal value - byte nnn`. For example, `\305` is the octal encoded form of byte `0xC5`, and `\237` is the octal encoded form of byte `0x9F`, and the byte sequence `0xC5 0x9F` is the UTF-8 encoded form of the Unicode `ş` character. — Remy Lebeau, Apr 02 '15 at 22:33
Are there any build in functions to unescape these strings or I must write function myself ? — rsk82, Apr 02 '15 at 22:36
Where are you getting these strings from in the first place? It is not common to see octal encoding in a file or user input. It is more likely to be seen in source code, for instance. — Remy Lebeau, Apr 02 '15 at 22:36
@RemyLebeau: from output of a tar command, my other question about that: http://stackoverflow.com/questions/29420963/how-to-make-cygwin-tar-output-proper-unicode-letters-instead-of-shashed-values , and maybe a found a solution ? http://www.codeproject.com/Questions/692737/How-convert-unicode-escape-sequence-string-to-read — rsk82, Apr 02 '15 at 22:38
That is one way to do it, though you will have to tweak it as that code is meant for JSON, which uses hex format, not octal format. — Remy Lebeau, Apr 02 '15 at 22:42

What is this format of text encoding ? Are there any build in methods to deal with it in c++?

0 Answers0