Essentially, like a bulletproof tank, i want my program to absord 404 errors and keep on rolling, crushing the interwebs and leaving corpses dead and bludied in its wake, or, w/e.
I keep getting this error:
Exception in thread "main" org.jsoup.HttpStatusException: HTTP error fetching URL. Status=404, URL=https://en.wikipedia.org/wiki/Hudson+Township+%28disambiguation%29
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:537)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:493)
at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:205)
at org.jsoup.helper.HttpConnection.get(HttpConnection.java:194)
at Q.Wikipedia_Disambig_Fetcher.all_possibilities(Wikipedia_Disambig_Fetcher.java:29)
at Q.Wikidata_Q_Reader.getQ(Wikidata_Q_Reader.java:54)
at Q.Wikipedia_Disambig_Fetcher.all_possibilities(Wikipedia_Disambig_Fetcher.java:38)
at Q.Wikidata_Q_Reader.getQ(Wikidata_Q_Reader.java:54)
at Q.Runner.main(Runner.java:35)
But I can't understand why because I am checking to see if I have a valid URL before I navigate to it. What about my checking procedure is incorrect?
I tried to examine the other stack overflow questions on this subject but they're not very authoritative, plus I implemented the many of the solutions from this one and this one, so far nothing has worked.
I'm using the apache commons URL validator, this is the code I've been using most recently:
//get it's normal wiki disambig page
String URL_check = "https://en.wikipedia.org/wiki/" + associated_alias;
UrlValidator urlValidator = new UrlValidator();
if ( urlValidator.isValid( URL_check ) )
{
Document docx = Jsoup.connect( URL_check ).get();
//this can handle the less structured ones.
and
//check the validity of the URL
String URL_czech = "https://www.wikidata.org/wiki/Special:ItemByTitle?site=en&page=" + associated_alias + "&submit=Search";
UrlValidator urlValidator = new UrlValidator();
if ( urlValidator.isValid( URL_czech ) )
{
URL wikidata_page = new URL( URL_czech );
URLConnection wiki_connection = wikidata_page.openConnection();
BufferedReader wiki_data_pagecontent = new BufferedReader(
new InputStreamReader(
wiki_connection.getInputStream()));