Decoding multiple encoded string

Question

How do I decode this to get the result below?

/browse_ajax?action_continuation=1\u0026amp;continuation=4qmFsgJAEhhVQ2ZXdHFQeUJNR183aTMzT2VlTnNaWncaJEVnWjJhV1JsYjNNZ0FEZ0JZQUZxQUhvQk03Z0JBQSUzRCUzRA%253D%253D

/browse_ajax?action_continuation=1&continuation=4qmFsgJAEhhVQ2ZXdHFQeUJNR183aTMzT2VlTnNaWncaJEVnWjJhV1JsYjNNZ0FEZ0JZQUZxQUhvQk03Z0JBQSUzRCUzRA%253D%253D

I've tried these, also using them multiple times as I did read strings may be encoded multiple times.

System.Text.RegularExpressions.Regex.Unescape(string)
System.Uri.UnescapeDataString(string)
System.Net.WebUtility.UrlDecode(string)

Which is the right function here or rather in what order do I need to call them to get that result. As the strings vary there may be other special characters in the set so doing a workaround, editing it myself, is somewhat too risky.

The string has to be decoded to work with new System.Net.WebClient().DownloadString(string).

EDIT: So I found out the above statement is wrong, I do not have to decode this to use WebClient.DownloadString(string). However the downloaded string suffers similar encoding too. Setting the WebClient's Encoding property to UTF8 inbefore downloading does most of the job, however some characters still seem corrupted, for example: Double quotes and ampersand stay \u0026quot; and \u0026amp;.

I don't know how to make \u0026 to &, so I can change & amp; to &.

If you've found the answer to your question then you should post it as *an answer* not as an edit to the question. — Servy, Jun 26 '17 at 19:26

score 0 · Answer 1 · answered Jun 05 '17 at 19:17

0

That these strings are double (actually triple) encoded in this way is a sign that the string is not being encoded correctly. If you own the code that encodes these strings, consider solving this problem there, which is the root of the issue.

That said, here are the decoding calls you need to make to decode this. I do not recommend this solution, as it is definitely a workaround. Again, the problematic behavior is in the code doing the encoding.

string val = "/browse_ajax?action_continuation=1\u0026amp;continuation=4qmFsgJAEhhVQ2ZXdHFQeUJNR183aTMzT2VlTnNaWncaJEVnWjJhV1JsYjNNZ0FEZ0JZQUZxQUhvQk03Z0JBQSUzRCUzRA%253D%253D";
val = System.Uri.UnescapeDataString(val);
val = System.Uri.UnescapeDataString(val);
val = System.Web.HttpUtility.HtmlDecode(val);

This will give you:

/browse_ajax?action_continuation=1&continuation=4qmFsgJAEhhVQ2ZXdHFQeUJNR183aTMzT2VlTnNaWncaJEVnWjJhV1JsYjNNZ0FEZ0JZQUZxQUhvQk03Z0JBQSUzRCUzRA==

If you really want to keep the %253D encoding of the equal signs, just call Uri.UnescapeData(string) once. This will leave the equal signs encoded, except as %3D, which is their proper encoded value.

answered Jun 05 '17 at 19:17

R Mac

184
7

Since it's not a web application [https://msdn.microsoft.com/de-de/library/7c5fyk1k(v=vs.110).aspx](MSDN) says to use `System.Net.WebUtility.HtmlDecode(string)`. And sadly it results in `/browse_ajax?action_continuation=1\u0026amp;continuation=4qmFsgJAEhhVQ2ZXdHFQeUJNR183aTMzT2VlTnNaWncaJEVnWjJhV1JsYjNNZ0FEZ0JZQUZxQUhvQk03Z0JBQSUzRCUzRA=%3` – Jun 05 '17 at 20:23
Sadly I'm not in charge of fixing that encoding, since this is actual youtube code. Maybe `new System.Net.WebClient().DownloadString(string)` does weird encoding things while downloading? – Jun 05 '17 at 20:31
The URLs provided were put through three encoding passes. I can't tell whether you did it or YouTube did it. Are you loading the literal string given to you by YouTube as a URI? If so, post the code that handles receiving the YouTube response message, extracting the URI, and loading the URI as a Uri object. – R Mac Jun 06 '17 at 12:36
@Kartoffel - Please don't do `new System.Net.WebClient().DownloadString(string)` as `System.Net.WebClient` is an `IDisposable` and should be disposed after use. – Enigmativity Jun 06 '17 at 13:05

score 0 · Accepted Answer · answered Jun 26 '17 at 19:30

Looked like the mysterium was solved to me, however I stumbled upon it again, didn't find any build in solution as these seem to fail decoding utf8 if the character is part of an html-escaped character.

As these however only seem to use the ampersand, I had to use Replace(@"\u0026","&") to be able to HtmlDecode and get a proper string.

Decoding multiple encoded string

2 Answers2