Fastest way to traverse or find elements in DIV HTML

Question

I am writing an utility which should hit the URL of a dynamic page, retrieve the content, search for a specific div tag in various nested div tags and grab the content.

Mainly, I am looking for some Java code/library. JavaScript or some JavaScript-based library would also work for me.

I shortlisted following -> JSoup, Jerry, JTidy(last updated in 2009-12-01). Which one is best performance wise?

Edit: Rephrased the question. Added shortlisted lib.

score 2 · Answer 1 · answered Dec 23 '11 at 09:44

If you want to scrape a page and parse it I recommend using node with jsdom.

install nodeJS (assuming linux):

sudo apt-get install git
cd ~
git clone git://github.com/joyent/node
cd node
git checkout v0.6
mkdir ~/.local # If it doesn't already exist
./configure --prefix=~/.local
make
make install

There is also a windows installer: http://nodejs.org/dist/v0.6.6/node-v0.6.6.msi

install jsdom:

$ npm install jsdom

Run this script modified with your url and the relevant selectors:

var jsdom = require('jsdom');

jsdom.env({
    html: 'url',
    done: function(errors, window) {
        console.log(window.document.getElementById('foo').textContent;
    }
});

score 2 · Answer 2 · answered Dec 23 '11 at 09:50

If you like jQuery's simple syntax, you can try Jerry :

Jerry is a jQuery in Java. Jerry is a fast and concise Java Library that simplifies HTML document parsing, traversing and manipulating.
Jerry is designed to change the way that you parse HTML content.

Syntax seems to be very simple. It should solve your problem in maximum 3 lines of code.

score 1 · Answer 3 · answered Dec 23 '11 at 09:09

1

http://jtidy.sourceforge.net/

JTidy is pretty good at parsing the DOM.

answered Dec 23 '11 at 09:09

Squiggs.

4,299
6
49
89

score 1 · Answer 4 · answered Dec 23 '11 at 09:10

1

If what you're after is a selector engine, then Sizzle is your best bet. Its the engine used by jQuery.

answered Dec 23 '11 at 09:10

isNaN1247

17,793
12
71
118

score 0 · Answer 5 · answered Dec 23 '11 at 09:09

0

give the unique id for each div and get by using document.getElementById(id)

answered Dec 23 '11 at 09:09

Dau

8,578
4
23
48

Fastest way to traverse or find elements in DIV HTML

5 Answers5