Questions tagged [jtidy]

JTidy is a Java port of HTML Tidy, a HTML syntax checker and pretty printer. JTidy can be used as a tool for cleaning up malformed and faulty HTML. In addition, JTidy provides a DOM interface to the document that is being processed, which effectively makes you able to use JTidy as a DOM parser for real-world HTML. JTidy is maintained by a group of volunteers.

JTidy is a Java port of HTML Tidy, a HTML syntax checker and pretty printer. Like its non-Java cousin, JTidy can be used as a tool for cleaning up malformed and faulty HTML. In addition, JTidy provides a DOM interface to the document that is being processed, which effectively makes you able to use JTidy as a DOM parser for real-world HTML.

JTidy was written by Andy Quick, who later stepped down from the maintainer position. Now JTidy is maintained by a group of volunteers.

Official Website: http://jtidy.sourceforge.net/

Useful Links:

97 questions
2
votes
3 answers

Convert html to xml using java

Can any one suggest me a best approach for converting html to xml using java Is there any API available for that? The html also might contain javascript code I have tried below code: import java.io.BufferedInputStream; import…
suresh
  • 35
  • 1
  • 1
  • 4
2
votes
4 answers

How to change HTML tag content in Java?

How can I change HTML content of tag in Java? For example: before:
text
**text**
text
after: …
bugisoft
  • 31
  • 1
  • 1
  • 4
2
votes
1 answer

using JTidy with Maven2

I am working on a Java project using spring2 and Maven. I have already incorporated JSLint4Java into Maven, but now find myself needing to do some further validation. There are a number of core pages in the build i.e. home page, search page etc.…
Simon Kenyon Shepard
  • 849
  • 2
  • 11
  • 23
2
votes
1 answer

Java: Jtidy convertion from html text to xhtml text

I am using JTidy i want to give it a string as an input instead of a file. Is that possible? How i can do that? This is my code: FileInputStream fis =null; String htmlFileName = "report.html"; //from html to xhtml try { …
mohammad
  • 2,142
  • 7
  • 35
  • 60
2
votes
2 answers

How to Remove all output from JTidy?

I'm using JTidy to clean up some XML, like this: Tidy tidy = new Tidy(); tidy.setXmlOut(true); tidy.setShowWarnings(false); tidy.parse(new FileInputStream(strStrippedHTMLPath), new FileOutputStream(strXMLPath)); The problem is that it always…
sudo
  • 319
  • 2
  • 4
  • 10
1
vote
1 answer

Pretty formatting HTML5 output

I am trying to automatically indent the HTML5 output. The tool which I tried to use was JTidy, but the problem is that it does not support HTML5 elements and for instance it moves all and to header whereas HTML5 use them in the body. As HTML is…
Vojtěch
  • 11,312
  • 31
  • 103
  • 173
1
vote
2 answers

How to let jtidy not convert Chinese characters into html entities?

I have some html to convert by jtidy, which contains some Chinese characters: 怎么回事 But the result looks like: 怎么回事 How to configure jtidy and let it…
Freewind
  • 193,756
  • 157
  • 432
  • 708
1
vote
2 answers

Parsing links with JTidy

I am currently using JTidy to parse an HTML document and fetch a collection of all anchor tags in the given HTML document. I then extract the value of each tag's href attribute to come up with a collection of links on the page. Unfortunately, these…
Andrew Keller
  • 3,198
  • 5
  • 36
  • 51
1
vote
1 answer

Issue with title text in Java

I have used Jtidy parser in java to fetch the title text. String titleText=null; try { titleText = doc.getElementsByTagName("title").item(0) .getFirstChild().getNodeValue(); } catch (Exception e1) { try { titleText =…
DJ31
  • 1,219
  • 3
  • 14
  • 19
1
vote
2 answers

How to open particular link on clicking a image in java?

I am using Jtidy parser to get the image from web page in java. URL url = new URL("www.yahoo.com"); HttpURLConnection conn = (HttpURLConnection) url.openConnection(); InputStream in = conn.getInputStream(); Document doc = new…
DJ31
  • 1,219
  • 3
  • 14
  • 19
1
vote
1 answer

JTidy node processing

I'm using JTidy in order to parse web page data. My question is the following: It is possible to call the XPath.evalate method on a previously retrieved node? I'll explain better. Usually you use the xmlPath.evaluate(pattern, document,…
user278064
  • 9,982
  • 1
  • 33
  • 46
1
vote
1 answer

problem in reading tag from web page in java</a></h3> <div class="excerpt">I am using jtidy parser to parse the web page. It is working, sort of: InputStream in=new URL("http://www.medicinenet.com/alopecia_areata/article.htm").openStream(); Document doc= new Tidy().parseDOM(in, null); String…</div> <div class="grid ai-start jc-space-between fw-wrap"> <div class="grid gs4 fw-wrap tags "> <a href="../../questions/tagged/java" class="post-tag grid--cell" title="show questions tagged 'java'" rel="tag">java</a> <a href="../../questions/tagged/jtidy" class="post-tag grid--cell" title="show questions tagged 'jtidy'" rel="tag">jtidy</a> </div> <div class="started mt0"> <div class="s-user-card s-user-card"> <time class="s-user-card--time" datetime="asked May 14 '11 at 07:10">asked May 14 '11 at 07:10</time> <a href="../../users/685205/dj31" class="s-avatar s-avatar__32 s-user-card--avatar"> <img class="s-avatar--image" src="../../users/profiles/685205.webp" data-jdenticon-width="32" data-jdenticon-height="32" data-jdenticon-value="DJ31" /> </a> <div class="s-user-card--info"> <a href="../../users/685205/dj31" class="s-user-card--link">DJ31</a> <ul class="s-user-card--awards"> <li class="s-user-card--rep" title="reputation score">1,219</li> <li class="s-award-bling s-award-bling__gold" title="3 gold badges">3</li> <li class="s-award-bling s-award-bling__silver" title="14 silver badges">14</li> <li class="s-award-bling s-award-bling__bronze" title="19 bronze badges">19</li> </ul> </div> </div> </div> </div> </div> </div> </div> <div class="mln24"> <div class="question-summary" id="question-summary-55711205"> <div class="statscontainer"> <div class="stats"> <div class="vote"> <div class="votes"> <span class="vote-count-post"><strong>1</strong></span> <div class="viewcount">vote</div> </div> </div> <div class="status answered-accepted"> <strong>1</strong> answer </div> </div> </div> <div class="summary"> <h3><a href="../../questions/55711205/jtidy-not-handling-some-characters-correctly" class="question-hyperlink">JTidy not handling some characters correctly</a></h3> <div class="excerpt">Certain characters get mangled after I call Tidy.parse. Two examples are: ’ instead of ' and ∼ instead of ~ I'm guessing that these must have come from Word or something similar but the tidy handles them very badly. Specifically, it converts them…</div> <div class="grid ai-start jc-space-between fw-wrap"> <div class="grid gs4 fw-wrap tags "> <a href="../../questions/tagged/jtidy" class="post-tag grid--cell" title="show questions tagged 'jtidy'" rel="tag">jtidy</a> </div> <div class="started mt0"> <div class="s-user-card s-user-card"> <time class="s-user-card--time" datetime="asked Apr 16 '19 at 15:00">asked Apr 16 '19 at 15:00</time> <a href="../../users/6109467/arcticdoom" class="s-avatar s-avatar__32 s-user-card--avatar"> <img class="s-avatar--image" src="../../users/profiles/6109467.webp" data-jdenticon-width="32" data-jdenticon-height="32" data-jdenticon-value="ArcticDoom" /> </a> <div class="s-user-card--info"> <a href="../../users/6109467/arcticdoom" class="s-user-card--link">ArcticDoom</a> <ul class="s-user-card--awards"> <li class="s-user-card--rep" title="reputation score">64</li> <li class="s-award-bling s-award-bling__bronze" title="6 bronze badges">6</li> </ul> </div> </div> </div> </div> </div> </div> </div> <div class="mln24"> <div class="question-summary" id="question-summary-55096029"> <div class="statscontainer"> <div class="stats"> <div class="vote"> <div class="votes"> <span class="vote-count-post"><strong>1</strong></span> <div class="viewcount">vote</div> </div> </div> <div class="status "> <strong>1</strong> answer </div> </div> </div> <div class="summary"> <h3><a href="../../questions/55096029/convert-word-xml-to-html-and-html-to-word-xml-using-java" class="question-hyperlink">Convert word xml to html and html to word xml(Using Java)</a></h3> <div class="excerpt">I tried some ways(Jtidy) to Convert word XML to HTML and HTML to word XML through JAVA. But missing some word properties in Final word XML file. Note: We have worked XML tags based on a schema. Is there a better way to convert Word XML to HTML?…</div> <div class="grid ai-start jc-space-between fw-wrap"> <div class="grid gs4 fw-wrap tags "> <a href="../../questions/tagged/java" class="post-tag grid--cell" title="show questions tagged 'java'" rel="tag">java</a> <a href="../../questions/tagged/xml" class="post-tag grid--cell" title="show questions tagged 'xml'" rel="tag">xml</a> <a href="../../questions/tagged/ms-word" class="post-tag grid--cell" title="show questions tagged 'ms-word'" rel="tag">ms-word</a> <a href="../../questions/tagged/xsd" class="post-tag grid--cell" title="show questions tagged 'xsd'" rel="tag">xsd</a> <a href="../../questions/tagged/jtidy" class="post-tag grid--cell" title="show questions tagged 'jtidy'" rel="tag">jtidy</a> </div> <div class="started mt0"> <div class="s-user-card s-user-card"> <time class="s-user-card--time" datetime="asked Mar 11 '19 at 06:03">asked Mar 11 '19 at 06:03</time> <a href="../../users/11182776/sathish-dais" class="s-avatar s-avatar__32 s-user-card--avatar"> <img class="s-avatar--image" src="../../users/profiles/11182776.webp" data-jdenticon-width="32" data-jdenticon-height="32" data-jdenticon-value="sathish Dais" /> </a> <div class="s-user-card--info"> <a href="../../users/11182776/sathish-dais" class="s-user-card--link">sathish Dais</a> <ul class="s-user-card--awards"> <li class="s-user-card--rep" title="reputation score">11</li> <li class="s-award-bling s-award-bling__bronze" title="1 bronze badges">1</li> </ul> </div> </div> </div> </div> </div> </div> </div> <div class="mln24"> <div class="question-summary" id="question-summary-53245944"> <div class="statscontainer"> <div class="stats"> <div class="vote"> <div class="votes"> <span class="vote-count-post"><strong>1</strong></span> <div class="viewcount">vote</div> </div> </div> <div class="status "> <strong>0</strong> answers </div> </div> </div> <div class="summary"> <h3><a href="../../questions/53245944/jtidy-issue-with-numbered-list-item" class="question-hyperlink">JTidy issue with numbered list item</a></h3> <div class="excerpt">I am facing a weird problem with Numbered List Item while generating pdf using IText. The serial number of the item list is not incremented by one when a <br/> tag is appended. consider the following example: String withoutBrTag =…</div> <div class="grid ai-start jc-space-between fw-wrap"> <div class="grid gs4 fw-wrap tags "> <a href="../../questions/tagged/java" class="post-tag grid--cell" title="show questions tagged 'java'" rel="tag">java</a> <a href="../../questions/tagged/itext" class="post-tag grid--cell" title="show questions tagged 'itext'" rel="tag">itext</a> <a href="../../questions/tagged/pdf-generation" class="post-tag grid--cell" title="show questions tagged 'pdf-generation'" rel="tag">pdf-generation</a> <a href="../../questions/tagged/jtidy" class="post-tag grid--cell" title="show questions tagged 'jtidy'" rel="tag">jtidy</a> <a href="../../questions/tagged/numbered-list" class="post-tag grid--cell" title="show questions tagged 'numbered-list'" rel="tag">numbered-list</a> </div> <div class="started mt0"> <div class="s-user-card s-user-card"> <time class="s-user-card--time" datetime="asked Nov 11 '18 at 04:50">asked Nov 11 '18 at 04:50</time> <a href="../../users/3904168/erfan-ahmed" class="s-avatar s-avatar__32 s-user-card--avatar"> <img class="s-avatar--image" src="../../users/profiles/3904168.webp" data-jdenticon-width="32" data-jdenticon-height="32" data-jdenticon-value="Erfan Ahmed" /> </a> <div class="s-user-card--info"> <a href="../../users/3904168/erfan-ahmed" class="s-user-card--link">Erfan Ahmed</a> <ul class="s-user-card--awards"> <li class="s-user-card--rep" title="reputation score">1,536</li> <li class="s-award-bling s-award-bling__gold" title="4 gold badges">4</li> <li class="s-award-bling s-award-bling__silver" title="19 silver badges">19</li> <li class="s-award-bling s-award-bling__bronze" title="34 bronze badges">34</li> </ul> </div> </div> </div> </div> </div> </div> </div> <div class="s-pagination pager fr"> <a class="s-pagination--item" href="../../questions/tagged/jtidy_page=2" rel="prev" title="Go to page 2">Prev </a> <a class="s-pagination--item" href="../../questions/tagged/jtidy_page=1" rel="" title="Go to page 1">1</a> <a class="s-pagination--item" href="../../questions/tagged/jtidy_page=2" rel="" title="Go to page 2">2</a> <div class="s-pagination--item is-selected">3</div> <a class="s-pagination--item" href="../../questions/tagged/jtidy_page=4" rel="" title="Go to page 4">4</a> <a class="s-pagination--item" href="../../questions/tagged/jtidy_page=5" rel="" title="Go to page 5">5</a> <a class="s-pagination--item" href="../../questions/tagged/jtidy_page=6" rel="" title="Go to page 6">6</a> <a class="s-pagination--item" href="../../questions/tagged/jtidy_page=7" rel="" title="Go to page 7">7</a> <a class="s-pagination--item" href="../../questions/tagged/jtidy_page=4" rel="next" title="Go to page 4"> Next</a> </div> </div> </div> </div> </div> <script src="../../static/js/stack-icons.js"></script> <script src="../../static/js/fromnow.js"></script> </body> </html>