Before asking this question I tried several different methods and of course tried googling for some direction/answers. I've checked through StackOverflow and can't seem to find a solution.
Basically, I want to create a tool that returns data based on a url and xpath for example
URL: http://www.google.co.uk/search?q=wicked+games
XPath: id('rso')/li/div/h3/a
which should return these results
I can parse the XML fine from other URL's for example if I was to grab an exact XML file such as http://renualsoft.com/jordon/person.xml however I'm unsure how I would do this for google?
I tried this
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder;
Document doc = null;
XPathExpression expr = null;
builder = factory.newDocumentBuilder();
doc = builder.parse("http://www.google.co.uk/search?q=wicked+games");
XPathFactory xFactory = XPathFactory.newInstance();
XPath xpath = xFactory.newXPath();
expr = xpath.compile("id('rso')/li/div/h3/a/@href");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
}
However I get this exception
Exception in thread "main" java.io.IOException: Server returned HTTP response code: 403 for URL: http://www.google.co.uk/search?q=wicked+games
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1625)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:633)
at com.sun.org.apache.xerces.internal.impl.XMLVersionDetector.determineDocVersion(XMLVersionDetector.java:189)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:799)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:764)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:123)
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:237)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:300)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:177)
at NewEmptyJUnitTest.query(NewEmptyJUnitTest.java:35)
at NewEmptyJUnitTest.main(NewEmptyJUnitTest.java:77)
Java Result: 1
Any help or guidance would be great thanks, I have tried looking elsewhere for an answer but like I said I couldn't find anything useful.