Questions tagged [jtidy]

JTidy is a Java port of HTML Tidy, a HTML syntax checker and pretty printer. JTidy can be used as a tool for cleaning up malformed and faulty HTML. In addition, JTidy provides a DOM interface to the document that is being processed, which effectively makes you able to use JTidy as a DOM parser for real-world HTML. JTidy is maintained by a group of volunteers.

JTidy is a Java port of HTML Tidy, a HTML syntax checker and pretty printer. Like its non-Java cousin, JTidy can be used as a tool for cleaning up malformed and faulty HTML. In addition, JTidy provides a DOM interface to the document that is being processed, which effectively makes you able to use JTidy as a DOM parser for real-world HTML.

JTidy was written by Andy Quick, who later stepped down from the maintainer position. Now JTidy is maintained by a group of volunteers.

Official Website: http://jtidy.sourceforge.net/

Useful Links:

97 questions
0
votes
0 answers

JTidy doctype error blocks parsing

I've been trying to scrape some online stuff using JTidy, but I got this annoying error and I have no idea how to fix it or get JTidy to ignore it: InputStream: Doctype given is "-//W3C//DTD XHTML 1.0 Transitional//EN" InputStream: Document content…
Cassidy Laidlaw
  • 1,318
  • 1
  • 14
  • 24
0
votes
1 answer

Can I prevent JTidy from converting an apostrophe in an attribute value to an entity

My input HTML has a line similar to this:
which JTidy is converting to
Is…
rhuffstedtler
  • 488
  • 1
  • 5
  • 17
0
votes
1 answer

Displaying Jtidy error/warning messages in a GUI JTextArea

I am writing a program that uses jtidy to clean up html from source code obtained from a URL. I want to display the errors and warnings in a GUI, in a JTextArea. How would I "reroute" the warnings from printing to stdout to the JTextArea? I've…
cHam
  • 2,624
  • 7
  • 26
  • 28
0
votes
1 answer

How to use debugger in netbeans that imports the w3c Tidy interface in java

I have a bug in my code that uses JTidy to clean some htmls. When it finds a malformed html, I have it to just skip it. But sometimes the program stalls on a malformed html so I want to see what's going on in my code. But I can't seem to run the…
Dan
  • 8,263
  • 16
  • 51
  • 53
0
votes
0 answers

HTML parser without tidying the source

I have several hundred old html files on my machine which I am trying to parse and extract some data. I have tried different Java parsers for it including Jsoup, Tagsoup, HTMLcleaner, JTidy etc. Due to the way html code is in files I can only use…
PTS Admin
  • 41
  • 1
  • 1
  • 4
0
votes
1 answer

Cleaning up Html5 pages with Java: Is it possible?

I need to clean up Html5 pages inside my Java project. So I need a Java library, or a command line program working both on Linux and Windows. JTidy doesn't work well (I tested it). HTML Tidy for HTML5 is a C++ Library and it's command line version…
-2
votes
1 answer

Remove redundant space in HTML in JAVA

Need to perform some HTML cleansing. Have HTML that has lots of redundant br tags, so far tried HtmlCleaner and jTidy without any results. Example:



... What I would like is just to get a single
back Any other ways to…
AlexVPerl
  • 7,652
  • 8
  • 51
  • 83
1 2 3 4 5 6
7