5

I am writing an application that uses HtmlUnit to screen scrape some data. The logic of which fields come from which parts of the page, and the XPath to retrieve them is getting a bit complicated, so before I refactor I want to write some simple unit tests. I have used the 'page.asXml()' method to get the page XML and save that as a file in my test resources folder, but how can I load it back in as an HtmlPage?

eg

    HtmlPage page = webClient.getPage(url);
    System.out.println(page.asXml());

Now in my unit test I want to do the equivalent of:

    HtmlPage page = new HtmlPage(myXmlTestFile);

But I can't seem to find anything that will do this. Any ideas?

Matt
  • 3,303
  • 5
  • 31
  • 53

3 Answers3

4

My final solution (concatenated from a number of other SO posts):

    URL url = new URL("http://www.example.com");

    InputStream is = this.getClass().getClassLoader().getResourceAsStream("myPageXmlFile.xml");
    String xmlPageString = IOUtils.toString(is);

    StringWebResponse response = new StringWebResponse(xmlPageString, url);
    WebClient client = WebClientConnector.createWebClient(false); // helper method for creating a WebClient instance
    HtmlPage page = HTMLParser.parseXHtml(response, client.getCurrentWindow());
Matt
  • 3,303
  • 5
  • 31
  • 53
1

I think there are 3 things you should try:

  1. Save it as XML and then just get it again from you local file system (this would be what you're trying to do):

    // save the page as a string into file "myfile.xml" and then...
    HtmlPage page = webClient.getPage("file:///home/Matt/Desktop/myfile.xml");
    
  2. Save it as an HTML page and then load it the same way as in the previous item:

    String myFile = "file:///home/Matt/Desktop/myfile.html";
    page.save(myFile);
    HtmlPage loadedPage = webClient.getPage(myFile);
    
  3. And, most likely, the best way to go: just process the page while downloading and save the data you need instead of the whole page:

    String pieceOfData = page.getFirstByXPath("//div[id='magic_id']");
    aCSVFile.write(pieceOfData);
    
Mosty Mostacho
  • 42,742
  • 16
  • 96
  • 123
  • The first one worked, so I have accepted your answer, thanks! Although, for unit testing it is best not to have absolute references to file paths so I have added my final solution below – Matt Dec 10 '13 at 10:09
0

I use this in my unit tests:

URL input = getClass().getResource("/path/to/file.xml");
XmlPage xmlDoc = new WebClient().getPage(input);

This is better for CI because you don't need to handle absolute path with file:///...

Kef
  • 71
  • 2
  • 8