When trying to load a page through htmlUnit
I always get a 301 error
, even though the exact same page loads fine in a browser.
The code giving me the error is
public String getPage(String url) {
try {
WebClient webClient = new WebClient(BrowserVersion.CHROME);
webClient.getOptions().setJavaScriptEnabled(false);
webClient.getOptions().setRedirectEnabled(false);
webClient.getOptions().setUseInsecureSSL(true);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
webClient.getOptions().setThrowExceptionOnScriptError(false);
// webClient.getOptions().setTimeout();
final HtmlPage page = webClient.getPage(url);
return page.asText();
} catch (IOException ex) {
Logger.getLogger(Worker.class.getName()).log(Level.SEVERE, null, ex);
} catch (FailingHttpStatusCodeException ex) {
Logger.getLogger(Worker.class.getName()).log(Level.SEVERE, null, ex);
}
return null;
}
Where url
is http://www.instagram.com/name
(also tried https, same error)
The error returned is
> Jul 20, 2015 1:52:20 PM com.gargoylesoftware.htmlunit.WebClient
> printContentIfNecessary INFO: statusCode=[301] contentType=[text/html]
> Jul 20, 2015 1:52:20 PM com.gargoylesoftware.htmlunit.WebClient
> printContentIfNecessary INFO: <html> <head><title>301 Moved
> Permanently</title></head> <body bgcolor="white"> <center><h1>301
> Moved Permanently</h1></center> <hr><center>nginx</center> </body>
> </html>
However, when from my browser I go to http://www.instagram.com/name, it loads up fine. I've heard Jsoup
may be useful for what I want to do (getting the text of a page) but I'm more familiar with htmlUnit
. If you have a fix for my code, or an alternative method then I'd be happy to try it.