JTidy doctype error blocks parsing

Asked Jan 04 '13 at 02:20

Active Jan 04 '13 at 02:20

Viewed 439 times

I've been trying to scrape some online stuff using JTidy, but I got this annoying error and I have no idea how to fix it or get JTidy to ignore it:

InputStream: Doctype given is "-//W3C//DTD XHTML 1.0 Transitional//EN"
InputStream: Document content looks like XHTML 1.0 Transitional
630 warnings, 1 error were found!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

It seems like a silly error - and there are no other errors, so this seems to be the one blocking JTidy from parsing the document. I'm parsing it from an InputStream directly off a HttpURLConnection, and I'm using the method Tidy.parseDom.

asked Jan 04 '13 at 02:20

Cassidy Laidlaw

1,318
1
14
24

have you imported the package jtidy in your import statement. – Adesh singh Jan 04 '13 at 04:29
Yes -- `import org.w3c.tidy.Tidy;` – Cassidy Laidlaw Jan 04 '13 at 11:40
If anyone's having the same problem I gave up on JTidy and just switched to JSoup, which works much better. – Cassidy Laidlaw Jan 06 '13 at 16:15

JTidy doctype error blocks parsing

0 Answers0