Jsoup wiki scraper how do I get table of contents box

Question

I am having trouble scraping the table of contents on wiki. I am making a simple web scraper for a personal project and I can't figure out how to scrape this data.

Here is my attempt at scraping the table of contents from any given wiki page

 String contentOver = doc.select("#toclimit-3 > li").first().text();

HERE IS THE CODE FROM THE PAGE I want to scrape, how do I get just the word "Chronology"?:

    <ul> 
    <li class="toclevel-1 tocsection-1"><a href="#Chronology"><span class="tocnumber">1</span> <span class="toctext">Chronology</span></a></li>

doc.select(".toctext").first().text(); //<< – StreamingBits Mar 12 '14 at 19:41 — StreamingBits, Mar 12 '14 at 19:41

score 1 · Accepted Answer · answered Mar 12 '14 at 19:07

1

You can just get it by the class name:

 Element li = doc.select("#toclimit-3 > li").first();
 String result = li.select(".toctext").first().text();

answered Mar 12 '14 at 19:07

alecxe

462,703
120
1,088
1,195

@StreamingBits could you try just this `doc.select(".toctext").first().text();`? – alecxe Mar 12 '14 at 19:37
@StreamingBits Omit the `first().text()` and take an approach similar to this: http://stackoverflow.com/a/7039950/771848. – alecxe Mar 12 '14 at 19:42
Wouldn't that just be selecting all the source code? I just want to select all those elements? – StreamingBits Mar 12 '14 at 19:59
@StreamingBits yup, replace "*" with ".toctext". – alecxe Mar 12 '14 at 20:02

Jsoup wiki scraper how do I get table of contents box

1 Answers1