Questions tagged [jtidy]

JTidy is a Java port of HTML Tidy, a HTML syntax checker and pretty printer. JTidy can be used as a tool for cleaning up malformed and faulty HTML. In addition, JTidy provides a DOM interface to the document that is being processed, which effectively makes you able to use JTidy as a DOM parser for real-world HTML. JTidy is maintained by a group of volunteers.

JTidy is a Java port of HTML Tidy, a HTML syntax checker and pretty printer. Like its non-Java cousin, JTidy can be used as a tool for cleaning up malformed and faulty HTML. In addition, JTidy provides a DOM interface to the document that is being processed, which effectively makes you able to use JTidy as a DOM parser for real-world HTML.

JTidy was written by Andy Quick, who later stepped down from the maintainer position. Now JTidy is maintained by a group of volunteers.

Official Website: http://jtidy.sourceforge.net/

Useful Links:

97 questions
3
votes
2 answers

XPath How to retrieve the value of a table cell from html document

I have a html document and somewhere inside the doc is below a table, I can get the table rows and java DOM objects. What is not clear to me is how to extract the value of the table cell when the value is a string and also when it is a binary…
Androider
  • 21,125
  • 36
  • 99
  • 158
3
votes
0 answers

Create the mhtml file using java

Can any one please suggest how to create an mhtml file using Java? I have already used the jtidy API (sourceforge), but I am unable to open the page in a browser. I am getting a blank screen if I open the mhtml file in IE. Note : File (.mhtml)…
Vasanth
  • 474
  • 2
  • 9
  • 31
3
votes
2 answers

JTidy upgrade broke document xpaths

I just updated to the newest version of jtidy which came out in october and it seems to have broken my document object for unknown reasons. This is my code: tidy = new…
giroy
  • 2,203
  • 6
  • 27
  • 38
3
votes
3 answers

JTidy java API toConvert HTML to XHTML

I am using JTidy to convert from HTML to XHTML but I found in my XHTML file this tag  . Can i prevent it ? this is my code //from html to xhtml try { fis = new FileInputStream(htmlFileName); } catch…
mohammad
  • 2,142
  • 7
  • 35
  • 60
3
votes
1 answer

Can I configure JTidy to ignore certain errors and warnings?

I am using JTidy to validate snippets of HTML generated in Java a rendering class. I would like to ignore certain warnings and errors. (EDIT: On second thoughts I might not want to suppress errors) For example, the following snippet that is…
vegemite4me
  • 6,621
  • 5
  • 53
  • 79
3
votes
0 answers

How to set encoding attribute of XML prolog in JTidy?

I need to generate XML from a HTML file with JTidy. The encoding of the source is GB2312, so I need to set the encoding of the generated XML to GB2312 as well. Current XML prolog: What I need:
Wen Wu
  • 62
  • 3
3
votes
1 answer

jTidy html to xhtml returns empty file

I'm trying to create an xhtml file from an html file, but i'm facing an error. During conversion i get the following error: line 1 column 1 - Warning: inserting missing 'title' element InputStream: Document content looks like HTML 2.0 1 warning, no…
Zoltan Varadi
  • 2,468
  • 2
  • 34
  • 51
2
votes
1 answer

JTidy (HTML-Tidy) Configuration used on w3c HTML Validator

I am using JTidy (the java port of the HTML Tidy library) to scrub some existing sites. When I used my configuration of JTidy is seems to be very strict and ends up cutting off the bottom of the page (bad markup). When i run the same markup through…
empire29
  • 3,729
  • 6
  • 45
  • 71
2
votes
2 answers

how to convert org.w3c.dom.Document to org.jdom.Document

I need to convert a org.w3c.dom.Document to org.jdom.Document I have tried the following following.. InputStream inputStream = new ByteArrayInputStream(str.getBytes()); Tidy tidy = new Tidy(); tidy.setMakeClean(false); tidy.setShowWarnings(true);…
Komal Goyal
  • 233
  • 5
  • 17
2
votes
1 answer

jTidy - Pretty Printing without Head, Title Tags

I am trying to use jTidy to pretty print on a HTML snippet that I have. So far I have done the following. protected String prettyPrintHTML(String rawHTML) { Tidy tidy = new Tidy(); tidy.setXHTML(true); tidy.setIndentContent(true); …
Yohan Liyanage
  • 6,840
  • 3
  • 46
  • 63
2
votes
1 answer

how to set parse duration limit to document object in java

I am using Jtidy parser in java.Here is my code... URL url = new URL("www.yahoo.com"); HttpURLConnection conn = (HttpURLConnection) url.openConnection(); InputStream in = conn.getInputStream(); Tidy tidy = new Tidy(); Document doc =…
DJ31
  • 1,219
  • 3
  • 14
  • 19
2
votes
2 answers

Java: remove < and > from text in XML (not tags)

I'm having a hard time escaping xml to be processed by Java. I'm using JTidy to escape unwanted characters, but struggle to remove "<" and ">" from values such as capacity < 1000 I'm using below code to escape the input public…
Jakub Sluka
  • 123
  • 1
  • 14
2
votes
4 answers

how to take title text from any web page in java

I am using java to fetch the title text from web page. I have fetched image from web page using Tag name as follows: int i=1; InputStream in=new URL("www.yahoo.com").openStream(); org.w3c.dom.Document doc= new Tidy().parseDOM(in, null); …
DJ31
  • 1,219
  • 3
  • 14
  • 19
2
votes
1 answer

Jtidy StringIndexOutOfBoundsException in Jmeter

I want to retrieve content from a webpage using JMeter. The data I'm looking for is inside a javascript block : (...)