0

I'm trying to write a program that will connect to a website, get the source code, look for the <body> tag using nodes. Within that tag there are three "textfields" that I want to input values in, and stream it back to the website.

I got so far to finding the <body> tag, but now I'm actually clueless.

try
{
  Tidy tidy = new Tidy();
  ByteArrayOutputStream baos = new ByteArrayOutputStream();
  Document docx = tidy.parseDOM(new URL("http://www.clubvip.co.za/Login.aspx").openStream(), baos);
  Node n = docx.getFirstChild();
  System.out.println(n.getNodeName());
  n = n.getFirstChild();

  System.out.println(n.getNodeName());
  while (n != null)
  {                     
    while (n != null) {
    if (n.getNodeName() != "body") {                        
        n = n.getNextSibling();                         
        System.out.println(n.getNodeName());
Paŭlo Ebermann
  • 73,284
  • 20
  • 146
  • 210
Foxticity
  • 11
  • 2
  • 1
    Have you considered using JSoup? It's designed for web scraping like this and imho provides a nicer interface that the DOM (and more importantly handles all sorts of nasty broken HTML). – Jeff Foster Jul 12 '11 at 13:37
  • Thanks, will try the JSoup tonight. :) – Foxticity Jul 12 '11 at 14:06

1 Answers1

0

You can actually get those tags directly by using

docx.getElementsByTagName("tagname")

See the documentation here

This will return a NodeList you can iterate through.

aldrin
  • 4,482
  • 1
  • 33
  • 50