What I am trying to do is scrape a simple inner HTML from a XHTML file. I have narrowed down my search to the element node, but I fail to retrieve the information.
PLEASE NOTE: the element node has no child node. I get a null pointer exception for doing that
here is the HTML SNIPPET
<div id="dvTitle" class="titlebtmbrdr01" style="line-height: 22px;">BAJAJ AUTO LTD. </div>
PLease also NOTE that this file has namespace as http://www.w3.org/1999/xhtml
You can see that I have the div element from which I want BAJAJ AUTO LTD
.
Here is the code that i am using
import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.Vector;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import jxl.read.biff.BiffException;
import jxl.write.WriteException;
import jxl.write.biff.RowsExceededException;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.w3c.dom.Text;
import com.sun.org.apache.xml.internal.serialize.Serializer;
public class BSEQuotesExtractor implements valueExtractor {
@Override
public Vector<String> getName(Document d) throws XPathExpressionException, RowsExceededException, BiffException, WriteException, IOException {
// TODO Auto-generated method stub
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
xpath.setNamespaceContext(new MynamespaceContext());
Object result = xpath.evaluate("//*[@id='dvTitle']",d, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
System.out.println(nodes.getLength());
System.out.println(nodes.item(0).getNodeName());
System.out.println(nodes.item(0).getAttributes().item(1).getNodeName());
System.out.println(nodes.item(0).getAttributes().item(1).getNodeValue());
System.out.println(nodes.item(0).getTextContent());
return null;
}
public static void main(String[] args) throws MalformedURLException, IOException, XPathExpressionException, RowsExceededException, BiffException, WriteException{
BSEQuotesExtractor q = new BSEQuotesExtractor();
DOMParser parser = new DOMParser(new URL("http://www.bseindia.com/bseplus/StockReach/StockQuote/Equity/BAJAJ%20AUTO%20LTD/BAJAJAUT/532977/Scrips").openStream());
Document d = parser.getDocument();
q.getName(d);
}
}
And this is the output I get
1
div
dvTitle
null
Now why do I get that null? I should get BAJAJ AUTO LTD
.