I am building a small Java application to fetch five Wikipedia pages and find substrings in the html source code. I am using the library org.apache.commons.lang3.StringUtils. However a Wikipedia article can be big, and there seems to be a limitation in StringUtils:
String html;
try {
html = Jsoup.connect("http://en.wikipedia.org/wiki/Canada").get().html();
} catch(IOException e) {
html = "";
}
String trimmedHtml = substringBetween(html, "<html>", "</html>");
System.out.println(html); // prints the whole source code fine
System.out.println(trimmedHtml); // prints null
Why does the console print null
for trimmedHtml
? The output should be (almost) as big as for html
. Is there a maximum length for the string output or for the parameters of substringBetween()
?