0

I have an scraped string with this format:

Un peque\u00F1o jard\u00EDn

And I need this:

Un pequeño jardín

The web page have a meta tag for chaset=uft-8:

<meta http-equiv="content-type" content="text/html; charset=utf-8">

I try to resolve with:

// Original text after regex capture
string text = "Un peque\\u00F1o jard\\u00EDn";
// Result 1: Un peque\\u00F1o jard\\u00EDn
string res1 =Encoding.UTF8.GetString(Encoding.UTF8.GetBytes(text));
// Result 2: Un peque\\u00F1o jard\\u00EDn
string res2 = System.Net.WebUtility.HtmlDecode(text);

I thing that this encodign is BigEndian 16, I tried with Encoding.BigEndianUnicode, and other encodings with unexpected results.

How can I decode to "Un pequeño jardín"?

Thanks for your time!

Duefectu
  • 1,563
  • 4
  • 18
  • 37
  • That text is screwed up. An unicode string would look like this: `"Un peque\u00F1o jard\u00EDn"`, but you have this: `"Un peque\\u00F1o jard\\u00EDn"` The difference is very subtle but crucial, the `-\\-`, when a char is scaped it's prepended with `-\-`, in this case the chars would be `\u00F1`, an unicode char, but something has unescaped incorrectly the content and expanded the special chars to it's text representation. Also the correct encoding for that type of code is Unicode. – Gusman Oct 11 '17 at 11:03
  • @Gusman, I try to Replace \\ with \ with: **Replace("\\\\","\\")**, but it din't work. – Duefectu Oct 11 '17 at 11:22
  • Worked with **System.Text.RegularExpressions.Regex.Unescape(text);** as suggested in https://stackoverflow.com/questions/9303257/how-to-decode-a-unicode-character-in-a-string – Duefectu Oct 11 '17 at 11:30

0 Answers0