0

JPedal library in java is usually used to convert pdf to XML or HTML. However, I needed to know if we could extract data from HTML5 document and save it to XML using JPedal library API? Is there any other possible alternative to this?

Also , I am trying to parse HTML5 document using Java and store it in XML. are there any good solutions to find just specific tags and render an XML out of them?

Please do let me know . Thank you.

mark stephens
  • 3,205
  • 16
  • 19

1 Answers1

0

There are a number of Java HTML parsers out there, but I recommend using the HTML5 parser from validator.nu available for download from here: http://about.validator.nu/htmlparser/.

Written to use the HTML5 parser algorithm by one of the main protagonists of HTML5, Henri Sivonen of Mozilla, you won't find a more reliable HTML parser and it creates a true DOM that can be manipulated using standard XML tools and queried for hyperlinks using XPath. There are examples of how to use XSLT transformations with it and how to get an XML serialization of the created DOM.

Alohci
  • 78,296
  • 16
  • 112
  • 156