jtidy parsing issue for chinese content

Question

I am facing an issue with the jtidy parser with the following chinese content:

<para>所示回报信息以美元表述，并且用如上所示的股份类别进行计算，已扣除所有基金运营费用，e 未扣除销售 费用。</para>

After parsing it returns an extra e after character "e" like...

<para>所示回报信息以美元表述，并且用如上所示的股份类别进行计算，已扣除所有基金运营费用，ee 未扣除销售 费用。</para>.

I am using latest version of jtidy.

what is your question? and can you please change `chines` to `Chinese` — OPK, May 05 '15 at 12:38
After parsing it introducing extra "e" in the content. If you observer the both the content above you will get the changes. — AnilGoud, May 05 '15 at 12:48
What's the version number of JTidy you're using? "Latest" requires readers to take a guess wrt the current releases, and will be out of date if someone reads the question later. What did you run to do the parsing? — Andrew Janke, May 05 '15 at 13:08
I am using jtidy-r938.jar. And following is the my code. InputStream is = new ByteArrayInputStream(str.getBytes("UTF-8")); Tidy tidy = new Tidy(); tidy.setXHTML(true); tidy.setShowWarnings(true); ByteArrayOutputStream baos = new ByteArrayOutputStream(); tidy.setInputEncoding("UTF-8"); tidy.setOutputEncoding("UTF-8"); tidy.parse(is,baos); String html = baos.toString("UTF-8"); — AnilGoud, May 05 '15 at 15:12
My actual content is :
所示回报信息以美元表述，并且用如上所示的股份类别进行计算，已扣除所有基金运营费用，e — AnilGoud, May 06 '15 at 10:40

0 Answers0