-6

I am having two problems here:

The following block of codes has got me confused. Primarily, I do not know what exactly the code is doing from the basics; I just copied it from a tutorial, and it seems to do what i want it to do. If anyone can explain in bits what it does, it will be really helpful.

The second problem is that I do not know why it throws an ArrayIndexOutOfBounds error, maybe because I do not understand it or otherwise. I really need clarification.

   try {
        Document searchLink = Jsoup.connect("https://www.google.com.ng/search?dcr=0&source=hp&ei=5-cIWuZ30cCwB7aUhrAN&q=" + URLEncoder.encode(searchValue, encoding))
                .userAgent("Mozilla/5.0").get();
        String websiteLink = searchLink.getElementsByTag("cite").get(0).text();


        //we are setting the value for the action "titles" in the wikipedia API with our own article title
        //we use the string method replaceAll() to remove the title of the article from the wikipedia URL that we generated from google
        //
        String wikiAPItoSearch = "https://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=json&titles=" 
                                + URLEncoder.encode(websiteLink.replaceAll("https://en.wikipedia.org/wiki/", ""),encoding);
        System.out.println(wikiAPItoSearch);

        //extraction of textfiles
        //from this point till down i cant really grab what is happening
        HttpURLConnection httpconn = (HttpURLConnection) new URL(wikiAPItoSearch).openConnection();
        httpconn.addRequestProperty("userAgent", "Mozilla/5.0");

        BufferedReader bf = new BufferedReader(new InputStreamReader(httpconn.getInputStream()));

        //read line by line
        String response = bf.lines().collect(Collectors.joining());
        bf.close();
        ///it returns ArrayIndexOutOfBounds here
        String result = response.split("\"extract\":\"")[1];
        System.out.println(result);
    } catch (IOException e) {
        // TODO: handle exception
        e.printStackTrace();
    }
khelwood
  • 55,782
  • 14
  • 81
  • 108
  • Why don't you print the response variable in the console and try to see if it really has the '"extract:":"' word – johnII Nov 13 '17 at 12:06

2 Answers2

1

I don't think anyone will take the time to explain the code for you. A good opportunity for you to do some debugging.

ArrayIndexOutOfBounds comes from response.split("\"extract\":\"")[1]. There is no guarantee that the String response can be split into at least 2 parts.

Add a check to avoid the error. Instead of...

    String result = response.split("\"extract\":\"")[1];

use...

    String[] parts = response.split("\"extract\":\"");
    String result;
    if (parts.length >= 2) {
        result = parts[1];
    } else {
        result = "Error..." + response; // a simple fallback 
    }

This is how split works:

String input = "one,two,three";
String[] parts = input.split(",");
System.out.println(parts[0]); // prints 'one'
System.out.println(parst[2]); // prints 'three'

So in your case, [1] means the second item in the parts array. "\"extract\":\"" has to appear at least once in the response, otherwise there will be only one item in the parts array, and you will get an error when you try to reach the second item (since it doesn't exist). It all gets extra tricky since .split accepts a regexp string and "\"extract\":\"" contains regexp reserved characters.

Stefan
  • 2,395
  • 4
  • 15
  • 32
  • the else block got executed and returned an unformatted string. what is happening ? does this means the string was splitted to less than 2? if it is , then why the outOfBounds on arraysize of [1].... – Olalekan Adebari Nov 13 '17 at 12:40
  • The string was splitted to less than 2. This means you get an OutOfBoundsException when you try to access the non existing second part. Indexes are zero-based so the first one is [0], the second one [1] – Adder Nov 13 '17 at 12:49
0

OPPS... i realized it was the API that i was using that caused the error, the API i got from wikimedia does not use /extract /as a delimetre , so i checked other stack overflow articles for a more cleaner API especially a one that uses /extract/ as a delimetre for the API response.

this is the new API i got :

https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro=&explaintext=&titles=

this was the former one that causes the error:

https://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=json&titles=

i think the error was caused by my inability to understand the process in-dept.. thanks for the responses.