Questions tagged [htmlunit]

HtmlUnit is a "headless browser". Which means that there is no browser GUI and it does no rendering. Though it has a CSS and JS engine to simulate a real browser. Primary purpose is testing and information extraction.

HtmlUnit is a "GUI-Less browser for Java programs". It models HTML documents and provides an API that allows you to invoke pages, fill out forms, click links, etc... just like you do in your "normal" browser.

It has fairly good JavaScript support (which is constantly improving) and is able to work even with quite complex AJAX libraries, simulating Chrome, Firefox or Internet Explorer depending on the configuration used.

It is typically used for testing purposes or to retrieve information from web sites.

HtmlUnit is not a generic unit testing framework. It is specifically a way to simulate a browser for testing purposes and is intended to be used within another testing framework such as JUnit or TestNG.

HtmlUnit is used as the underlying "browser" by different Open Source tools like Canoo WebTest, JWebUnit, Selenium WebDriver, JSFUnit, Celerity, ...

HtmlUnit was originally written by Mike Bowler of Gargoyle Software and is released under the Apache 2 license.

Useful links

1835 questions
7
votes
1 answer

How to take a screenshot with HTML-Unit?

I'm from Germany so excuse me for some bad sentences. I've coded an web-based application, now I want to do a screenshot of the page in one part of the code. I'm using HTML-Unit, so I want to know how I can do it with it, it would be bad if I needed…
aGuest
  • 71
  • 1
  • 2
7
votes
2 answers

How stable and fast is HtmlUnit

I'm upgrading from selenium-1 to selenium-2 and trying out the new HtmlUnit driver. I've tried a few basic tests on it (open a page, get_text,..) and it seems Extremely slow (I think the chrome/FF remote drivers are faster than it) Extremely…
Guy
  • 14,178
  • 27
  • 67
  • 88
7
votes
1 answer

Selenium HtmlUnitDriver Web Scrape Got Captcha Page From EC2 Server

I wrote a simple web scraper to scrape expedia.com. Using Java Selenium HtmlUnitDriver, i was able to successfully scrape data from the site if i run it locally. However, when i deploy this on to an EC2 Server, it always returns me the page where…
7
votes
1 answer

How to write Event Handlers and detect certain JavaScript calls using HTMLUnit?

I want to use the Java API, HTMLUnit, to detect the number of eval() calls being called on the webpage by the JavaScript program. However, HTMLUnit doesn't have a built in handler for this type of JavaScript function. How can this be done? Thanks.
ilikeyoyo
  • 168
  • 4
  • 20
7
votes
2 answers

htmlunit Cannot read property "push" from undefined

I'm trying to crawl a website using htmlunit. Whenever I run it though it only outputs the following error: Caused by: net.sourceforge.htmlunit.corejs.javascript.EcmaError: TypeError: Cannot read property "push" from undefined…
Maverick283
  • 1,284
  • 3
  • 16
  • 33
7
votes
3 answers

Impossible site for HtmlUnit?

I cannot, for the life of me, rig HtmlUnit up to grab this…
Stu Kalide
  • 121
  • 1
  • 2
7
votes
1 answer

Can HtmlUnit handle JavaScript redirects?

Instead of automatically following JavaScript redirects, can I force HtmlUnit to return the URL the JavaScript wants to redirect me to? // context: If there's 5 JavaScript redirects in a row, I can only see the URL of the page where it stopped - I…
Marco
  • 4,345
  • 6
  • 43
  • 77
7
votes
1 answer

calling a JavaScript function with HTMLUnit

I'm trying to call the function showPage('3'); of this page, for use the page source code after. I tried to do with htmlUnit like so: WebClient webClient = new WebClient(); webClient.waitForBackgroundJavaScriptStartingBefore(10000); HtmlPage page…
Der Fede
  • 73
  • 1
  • 1
  • 5
7
votes
2 answers

Process AJAX request in Htmlunit

I have a program written to scrape the source code from a webpage after a button is clicked. I am unable to scrape the right page because I believe an AJAX request is being sent, and I am not waiting for this response to take place. My code is…
Ctech45
  • 496
  • 9
  • 17
7
votes
2 answers

How to find div inside another div using HtmlUnit?

I am working on some project where in i need scrap some information from different website.I am using HtmlUnit for this purpose,But problem is i am unable to traverse through the elements on one page. Example:
Kishan_KP
  • 4,488
  • 6
  • 27
  • 46
7
votes
1 answer

What is the trade-off for disabling CSS in HTMLUnit?

I experienced slowness in HTMLUnit 2.12, and therefore disabled CSS as explained in HTMLUnit : super slow execution?. I want to understand what the trade-off is. Does it mean that I cannot use XPath selectors? Are there other tradeoffs?
David Michael Gang
  • 7,107
  • 8
  • 53
  • 98
7
votes
1 answer

How to get the pure raw HTML of a page in HTMLUnit while ignoring JavaScript and CSS?

I just want the text content of page and I want the fetching to be as lightweight as possible. Can I turn off all the parsing and additional loading of JavaScript, CSS and other external content that HTMLUnit does out of the box?
Thomas
  • 10,289
  • 13
  • 39
  • 55
6
votes
1 answer

Is HtmlUnit 2.8 getFirstByXPath different from HtmlUnit 1.14 getFirstByXPath?

I have a site structure that looks something like this:
Item 1 Desc 1
jamesv
  • 155
  • 3
  • 13
6
votes
1 answer

Restricting Selenium/Webdriver/HtmlUnit to a certain domain

While using selenium/webdriver for web scraping, I realized the target site has google analytics script running. Is there a way to restrict selenium/webdriver/htmlunit to avoid certain urls/domains ? Thanks,
Ali Salehi
  • 6,899
  • 11
  • 49
  • 75
6
votes
3 answers

Login check using HtmlUnit

Hy... i want to login to some 3rd party sites using HtmlUnit. But HtmlUnit should be able to tell me whether the login attempt to the input site is successful or not. Is there any way around to perform this task using HtmlUnit. Please help…
user737865
  • 61
  • 1
  • 2