1

I am parsing an html string with Jsoup in order to extract just the text, and want to get the exact text, but when I parse strings that include escaped chars Jsoup unescapes them. For example - if I parse

<p>Let&#39;s try</p>

Jsoup returns

<p>Let's try</p>

I searched extensively for a solution and tried using the doc.outputSettings with different options of charset and escapeMode, but couldn't get Jsoup to not escape the html special chars

Adi Gutner
  • 31
  • 3
  • Why does it matter? The HTML has exactly the same meaning either way. – Quentin Jun 27 '21 at 12:17
  • After extracting the text I am running some manipulations on it, then I want to find and replace in the original string. Because of the unescaping I can't find the extracted text – Adi Gutner Jun 27 '21 at 12:55

1 Answers1

0

Judging from this comment and the current EscapeMode documentation, this is not possible with Jsoup.

I'm never going to implement EscapeMode.none as it only leads to broken parse trees.

A_A
  • 1,832
  • 2
  • 11
  • 17