Questions tagged [htmlunit]

HtmlUnit is a "headless browser". Which means that there is no browser GUI and it does no rendering. Though it has a CSS and JS engine to simulate a real browser. Primary purpose is testing and information extraction.

HtmlUnit is a "GUI-Less browser for Java programs". It models HTML documents and provides an API that allows you to invoke pages, fill out forms, click links, etc... just like you do in your "normal" browser.

It has fairly good JavaScript support (which is constantly improving) and is able to work even with quite complex AJAX libraries, simulating Chrome, Firefox or Internet Explorer depending on the configuration used.

It is typically used for testing purposes or to retrieve information from web sites.

HtmlUnit is not a generic unit testing framework. It is specifically a way to simulate a browser for testing purposes and is intended to be used within another testing framework such as JUnit or TestNG.

HtmlUnit is used as the underlying "browser" by different Open Source tools like Canoo WebTest, JWebUnit, Selenium WebDriver, JSFUnit, Celerity, ...

HtmlUnit was originally written by Mike Bowler of Gargoyle Software and is released under the Apache 2 license.

Useful links

1835 questions
15
votes
5 answers

How to ignore HTMLUnit warnings/errors related to jQuery?

Is it possible to teach HTMLUnit to ignore certain javascript scripts/files on a web page? Some of them are just out of my control (like jQuery) and I can't do anything with them. Warnings are annoying, for example: [WARN]…
yegor256
  • 102,010
  • 123
  • 446
  • 597
14
votes
1 answer

Making AJAX Applications Crawlable? How to build a simple web service on Google App Engine to produce HTML Snapshots?

Real World Problem: I have my app hosted on Heroku, who (to my knowledge) are unable to offer a solution for running a Headless (GUI-less) Browser - such as HTMLUnit - for generating HTML Snapshots for Googlebot to index my AJAX content. My…
Chris Jacob
  • 11,878
  • 7
  • 47
  • 42
14
votes
1 answer

Is there a way to trigger scroll event with HtmlUnit or is it not possible at all?

I am currently learning HtmlUnit in order to scrape websites. Everything went well and smooth until I encountered a dynamic page (as an example, I am using Pinterest website) on which elements are added on the fly when the user scrolls down. I have…
DjCode
  • 141
  • 4
14
votes
7 answers

Screen scraping with Python

Does Python have screen scraping libraries that offer JavaScript support? I've been using pycurl for simple HTML requests, and Java's HtmlUnit for more complicated requests requiring JavaScript support. Ideally I would like to be able to do…
Marco
  • 4,345
  • 6
  • 43
  • 77
14
votes
1 answer

HTMLUnit: Tons of obsolete content and can't create objects warnings on getPage() then fails with Exception invoking setOuterHTML on getByXPath()

I'm trying out HTMLUnit to automate downloading data off a webapp. However, I am getting a whole mess of warnings on getPage() (most of which seem to deal with linked scripts that I don't think i even need) and then a fatal…
Jeff
  • 497
  • 3
  • 5
  • 14
13
votes
3 answers

How do I use the HTMLUnit driver with Selenium from Python?

How do I tell Selenium to use HTMLUnit? I'm running selenium-server-standalone-2.0b1.jar as a Selenium server in the background, and the latest Python bindings installed with "pip install -U selenium". Everything works fine with Firefox. But I'd…
frabcus
  • 919
  • 1
  • 7
  • 18
13
votes
2 answers

Java - Sending a post request with HtmlUnit

Can't really find any help on this but I've been trying to send a post request with HtmlUnit. The code I have is: final WebClient webClient = new WebClient(); // Instead of requesting the page directly we create a WebRequestSettings…
Joe Taylor
  • 171
  • 1
  • 1
  • 16
13
votes
2 answers

Accessing html generated by Javascript with htmlunit -Java

I am trying to be able to test a website that uses javascript to render most of the HTML. With the HTMLUNIT browser how would you be able to access the html generated by the javascript? I was looking through their documentation but wasn't sure what…
rush66
  • 151
  • 1
  • 2
  • 5
13
votes
2 answers

Html, handling a JSON response

I have a page that comes back as an UnexpectedPage in HtmlUnit, the response is JSON. Can I use HTMLUnit to parse this or will I need an additional library?
benstpierre
  • 32,833
  • 51
  • 177
  • 288
13
votes
2 answers

How to save HtmlUnit cookies to a file?

I'd like to save HtmlUnit cookies to a file and on next run load them from that one. How can I do that? Thanks.
Fluffy
  • 27,504
  • 41
  • 151
  • 234
13
votes
2 answers

crawl dynamic web page using htmlunit

I am crawling data using HtmlUnit from a dynamic webpage, which uses infinite scrolling to fetch data dynamically, just like facebook's newsfeed. I used the following sentence to simulate the scrolling down…
12
votes
3 answers

HtmlUnit + Selenium within Production

I am currently using HtmlUnit and Selenium to drive it (WebDriver) within my production code. I am scaping and interacting with various websites programmatically with these libraries and am having some success and not experiencing memory issues…
Steven
  • 3,844
  • 3
  • 32
  • 53
12
votes
2 answers

How to combine scrapy and htmlunit to crawl urls with javascript

I'm working on Scrapy to crawl pages,however,I can't handle the pages with javascript. People suggest me to use htmlunit, so I got it installed,but I don't know how to use it at all.Dose anyone can give an example(scrapy + htmlunit) for me? Thanks…
HjySix
  • 197
  • 1
  • 3
  • 9
12
votes
3 answers

HtmlUnit to view source

HtmlUnit for Java is great but I haven't been able to figure out how to view the full source or return the source of a web site as a string. can anyone help me with this? I know the follow will read the site but now I just want to return the source…
Jake Sankey
  • 4,977
  • 12
  • 39
  • 53
12
votes
2 answers

How to use HtmlUnit in Java?

I'm trying to use HtmlUnit in Java to log into a website. First i enter the user name then password. After that i need to select an option from a dropdown box. entering the user and password seemed to have worked but when i try to select the item…
Peter
  • 5,071
  • 24
  • 79
  • 115