You stated you had used the following:
StringEscapeUtils.escapeHTML4(text);
Instead try this:
StringEscapeUtils.unescapeHTML4(text);
You were re-encoding the HTML entitites;
Documentation here:
https://commons.apache.org/proper/commons-lang/javadocs/api-release/org/apache/commons/lang3/StringEscapeUtils.html
// import commons http://commons.apache.org
import org.apache.commons.lang3.StringEscapeUtils;
public static String stripHtml(String str) {
return StringEscapeUtils.unescapeHtml4(str.replaceAll("<[A-Za-z/].*?>", "")).trim();
}
In addition, you can use this to decode other encoded types (JSON, XML, etc) or use it to encode.
This isn't what you asked but may also be useful for URL decoding:
String result = URLDecoder.decode(url, "UTF-8");
API reference here:
http://docs.oracle.com/javase/7/docs/api/java/net/URLDecoder.html