2

While getting results from duckduckgo.com with different queries, after 20-30 iterations, i get this exception:

Exception in thread "main" org.jsoup.HttpStatusException: HTTP error fetching URL. Status=400, URL=https://duckduckgo.com/html/?q=  Hermann_William_Goering
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:682)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:629)
at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:261)
at org.jsoup.helper.HttpConnection.get(HttpConnection.java:250)
at WebContextExtractor.DDGresultsScraping(WebContextExtractor.java:378)
at WebContextExtractor.main(WebContextExtractor.java:521)

I have no idea what's the problem, if i try to visit that link manually on Google Search i can reach that without any problem.

The error occurs when i try to get the document by the page with this simple code:

Connection conn = Jsoup.connect(DUCKDUCKGO_SEARCH_URL + query)
            .userAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
                    + "(KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36"); 

Document doc = conn.get(); <------ here exception
  • Http 400 means the request was malformed. So you can add `try-catch` to this line `Document doc = conn.get();`, and print the value of `query` in the `catch` block to see what kind of `query` will lead to 400 error – Javdroider Sep 01 '17 at 09:34
  • I have done that, and the query is " Hermann_William_Goering". This doens't seem to be malformed though. There are two spaces before the word but they are not a problem in another more queries – Jacopo Rufini Sep 01 '17 at 09:50
  • Try to URL encode the query. Space should be %20 to be 100% correct – r3dst0rm Sep 02 '17 at 09:06
  • do you have answer for this situation? – GoldenScrew Dec 22 '17 at 17:36

0 Answers0