I am using Jsoup 1.9.2 to process and clean some XML input of specific tags. During this, I noticed that Jsoup behaves strangely when it is asked to clean title
tags. Specifically, other XML tags within the title
tag do not get removed, and in fact get replaced by their escaped forms.
I created a short unit test for this as below. The test fails, as output
comes out with the value of CuCl<sub>2</sub>
.
@Test
public void stripXmlSubInTitle() {
final String input = "<title>CuCl<sub>2</sub></title>";
final String output = Jsoup.clean(input, Whitelist.none());
assertEquals("CuCl2", output);
}
If the title
tag is replaced with other tags (e.g., p
or div
), then everything works as expected. Any explanation and workaround will be appreciated.