Extracting Table Data using JSoup

Question

I'm trying to extract financial information from a table using JSoup. I've reviewed similar questions and can get their examples to work (here are two:

Using Jsoup to extract data

Using JSoup To Extract HTML Table Contents).

I'm not sure why the code doesn't work on my URL.

Below are 3 different attempts. Any help would be appreciated.

String s = "http://financials.morningstar.com/valuation/price-ratio.html?t=AXP&region=usa&culture=en-US";

//Attempt 1
try {
    Document doc = Jsoup.connect("http://financials.morningstar.com/valuation/price-ratio.html?t=AXP&region=USA&culture=en_US").get();

    for (Element table : doc.select("table#currentValuationTable.r_table1.text2")) {
        for (Element row : table.select("tr")) {
            Elements tds = row.select("td");
            if (tds.size() > 6) {
                System.out.println(tds.get(0).text() + ":" + tds.get(1).text());
            }
        }
    }
} 
catch (IOException ex) {
    ex.printStackTrace();
}

// Attempt 2
try {
    Document doc = Jsoup.connect(s).get(); 
    for (Element table : doc.select("table#currentValuationTable.r_table1.text2")) {
        for (Element row : table.select("tr")) {
            Elements tds = row.select("td");
            for (int i = 0; i < tds.size(); i++) {
                System.out.println(tds.get(i).text());
            }
        }
    }        
} 
catch (IOException ex) {
    ex.printStackTrace();
}

//Attempt 3
try {
    Document doc = Jsoup.connect(s).get(); 
    Elements tableElements = doc.select("table#currentValuationTable.r_table1.text2");

    Elements tableRowElements = tableElements.select(":not(thead) tr");

    for (int i = 0; i < tableRowElements.size(); i++) {
        Element row = tableRowElements.get(i);
        System.out.println("row");
        Elements rowItems = row.select("td");
        for (int j = 0; j < rowItems.size(); j++) {
            System.out.println(rowItems.get(j).text());
        }
    }        
} 
catch (IOException ex) {
    ex.printStackTrace();
}

Print what `Document` was able to read from page (use `System.out.println(doc);`). Something tells me that your problem may be related with fact that HTML content you are looking for is dynamically added by JavaScript by browser, which Jsoup can't do since it doesn't have JavaScript support. In that case you should use more powerful tool like web driver (like Selenium). — Pshemo, Jun 14 '15 at 15:39
Try to disable JavaScript and see if you can see tables in browser... — Pshemo, Jun 14 '15 at 15:46
@Ifurnini My attempts produce nothing. The output says "run: BUILD SUCCESSFUL (total time: 1 second)" — , Jun 14 '15 at 18:25
@Pshemo not sure how to disable the JavaScript. Will need sometime to google that. Also, I printed out what the document said in each of the attempts. It's a lot of, what looks to me, HTML. Is there something specific I should be looking for? — , Jun 14 '15 at 18:27
I mean disable JavaScript in browser you are using. For instance in chrome: https://www.youtube.com/watch?v=i3mQxhcxu8Y — Pshemo, Jun 14 '15 at 18:37
I disabled it, refreshed the page and then the content disappeared. ...So, your recommendation is to explore Selenium (or a tool like it)? BTW, thanks for the input. — , Jun 14 '15 at 18:47
Yes. Jsoup is simply HTML parser, not browser emulator and it doesn't support JavaScript. You need other tool like Selenium web driver. — Pshemo, Jun 14 '15 at 18:52

score 0 · Answer 1 · answered Jan 21 '16 at 09:39

Answer provided by Psherno:

Print what Document was able to read from page (use System.out.println(doc);). Something tells me that your problem may be related with fact that HTML content you are looking for is dynamically added by JavaScript by browser, which Jsoup can't do since it doesn't have JavaScript support. In that case you should use more powerful tool like web driver (like Selenium).

Extracting Table Data using JSoup

1 Answers1

Linked