1

How to parse non well-formed HTML in android ?

I tried to use XOM and TagSoup, but i get the following error when creating the Builder:

11-26 20:42:39.294: ERROR/dalvikvm(1298): Could not find method org.apache.xerces.impl.Version.getVersion, referenced from method nu.xom.Builder.

Must i install Xerces to use XOM or can i use tagsoup without XOM ?

Nikhil
  • 16,194
  • 20
  • 64
  • 81
Kristof
  • 557
  • 6
  • 14

2 Answers2

2

You might find JTidy (http://jtidy.sourceforge.net/) - a port of HTMLTidy to be sufficiently lightweight. It outputs XHTML on request

peter.murray.rust
  • 37,407
  • 44
  • 153
  • 217
0

XOM may require Xerces to be in the classpath - it may depend on the version of Java. Currently we use

xercesImpl-2.8.0.jar
peter.murray.rust
  • 37,407
  • 44
  • 153
  • 217
  • 1
    I think Xerces itself is too heavy to work on android... I don't understand why i don't find information about such a basic thing as html scraping for android... – Kristof Nov 26 '09 at 22:14