Connect to website using nodes

Question

I'm trying to write a program that will connect to a website, get the source code, look for the <body> tag using nodes. Within that tag there are three "textfields" that I want to input values in, and stream it back to the website.

I got so far to finding the <body> tag, but now I'm actually clueless.

try
{
  Tidy tidy = new Tidy();
  ByteArrayOutputStream baos = new ByteArrayOutputStream();
  Document docx = tidy.parseDOM(new URL("http://www.clubvip.co.za/Login.aspx").openStream(), baos);
  Node n = docx.getFirstChild();
  System.out.println(n.getNodeName());
  n = n.getFirstChild();

  System.out.println(n.getNodeName());
  while (n != null)
  {                     
    while (n != null) {
    if (n.getNodeName() != "body") {                        
        n = n.getNextSibling();                         
        System.out.println(n.getNodeName());

Have you considered using JSoup? It's designed for web scraping like this and imho provides a nicer interface that the DOM (and more importantly handles all sorts of nasty broken HTML). — Jeff Foster, Jul 12 '11 at 13:37

score 0 · Answer 1 · answered Jul 12 '11 at 14:01

0

You can actually get those tags directly by using

docx.getElementsByTagName("tagname")

See the documentation here

This will return a NodeList you can iterate through.

answered Jul 12 '11 at 14:01

aldrin

4,482
1
33
50

Connect to website using nodes

1 Answers1