0

I am facing an issue with the jtidy parser with the following chinese content:

<para>所示回报信息以美元表述,并且用如上所示的股份类别进行计算,已扣除所有基金运营费用,e 未扣除销售 费用。</para>

After parsing it returns an extra e after character "e" like...

<para>所示回报信息以美元表述,并且用如上所示的股份类别进行计算,已扣除所有基金运营费用,ee 未扣除销售 费用。</para>.

I am using latest version of jtidy.

Uwe Plonus
  • 9,803
  • 4
  • 41
  • 48
AnilGoud
  • 45
  • 7
  • what is your question? and can you please change `chines` to `Chinese` – OPK May 05 '15 at 12:38
  • After parsing it introducing extra "e" in the content. If you observer the both the content above you will get the changes. – AnilGoud May 05 '15 at 12:48
  • What's the version number of JTidy you're using? "Latest" requires readers to take a guess wrt the current releases, and will be out of date if someone reads the question later. What did you run to do the parsing? – Andrew Janke May 05 '15 at 13:08
  • I am using jtidy-r938.jar. And following is the my code. InputStream is = new ByteArrayInputStream(str.getBytes("UTF-8")); Tidy tidy = new Tidy(); tidy.setXHTML(true); tidy.setShowWarnings(true); ByteArrayOutputStream baos = new ByteArrayOutputStream(); tidy.setInputEncoding("UTF-8"); tidy.setOutputEncoding("UTF-8"); tidy.parse(is,baos); String html = baos.toString("UTF-8"); – AnilGoud May 05 '15 at 15:12
  • My actual content is :

    所示回报信息以美元表述,并且用如上所示的股份类别进行计算,已扣除所有基金运营费用,e

    – AnilGoud May 06 '15 at 10:40

0 Answers0