Background
I need to parse some string from HTML that is of a URL (seems it's inside JSON), so I tried to use org.apache.commons.text.StringEscapeUtils.unescapeJson.
An example of such a URL started with this as the input:
https:\/\/scontent.cdninstagram.com\/v\/t51.2885-19\/40405422_462181764265305_1222152915674726400_n.jpg?stp=dst-jpg_s150x150\\u0026
The problem
It seems it had some characters that weren't handled so if I perform this:
val test="https:\\/\\/scontent.cdninstagram.com\\/v\\/t51.2885-19\\/40405422_462181764265305_1222152915674726400_n.jpg?stp=dst-jpg_s150x150\\\\u0026\n"
Log.d("AppLog", "${StringEscapeUtils.unescapeJson(test)}")
the result is:
https://scontent.cdninstagram.com/v/t51.2885-19/40405422_462181764265305_1222152915674726400_n.jpg?stp=dst-jpg_s150x150\u0026
You can see that there is still "0026" in it, so I've found that using this solved it:
StringEscapeUtils.unescapeJson(input).replace("\\u0026","&").replace("\\/", "/")
This works, but I think I should use something more official, as it might fail due to too-direct replacing of substrings.
What I've tried
Looking at unescapeJson code (which is the same for Java&Json, it seems), I thought that maybe I could just add the rules:
/**based on StringEscapeUtils.unescapeJson, but with addition of 2 more rules*/
fun unescapeUrl(input: String): String {
val unescapeJavaMap= hashMapOf<CharSequence, CharSequence>(
"\\\\" to "\\",
"\\\\" to "\\",
"\\\"" to "\"",
"\\'" to "'",
"\\" to StringUtils.EMPTY,
//added rules:
"\\u0026" to "&",
"\\/" to "/"
)
val aggregateTranslator = AggregateTranslator(
OctalUnescaper(),
UnicodeUnescaper(),
LookupTranslator(EntityArrays.JAVA_CTRL_CHARS_UNESCAPE),
LookupTranslator(Collections.unmodifiableMap(unescapeJavaMap))
)
return aggregateTranslator.translate(input)
}
This doesn't work. It leaves the string with "\u0026" in it.
The questions
What did I do wrong here? How can I fix this?
It is true it's best to use something similar to the original code, instead of using "replace", right?
BTW, I use this on Android using Kotlin, but same can be done on Java on PC.