1

I'm trying to extract financial information from a table using JSoup. I've reviewed similar questions and can get their examples to work (here are two:

Using Jsoup to extract data

Using JSoup To Extract HTML Table Contents).

I'm not sure why the code doesn't work on my URL.

Below are 3 different attempts. Any help would be appreciated.

String s = "http://financials.morningstar.com/valuation/price-ratio.html?t=AXP&region=usa&culture=en-US";

//Attempt 1
try {
    Document doc = Jsoup.connect("http://financials.morningstar.com/valuation/price-ratio.html?t=AXP&region=USA&culture=en_US").get();

    for (Element table : doc.select("table#currentValuationTable.r_table1.text2")) {
        for (Element row : table.select("tr")) {
            Elements tds = row.select("td");
            if (tds.size() > 6) {
                System.out.println(tds.get(0).text() + ":" + tds.get(1).text());
            }
        }
    }
} 
catch (IOException ex) {
    ex.printStackTrace();
}
// Attempt 2
try {
    Document doc = Jsoup.connect(s).get(); 
    for (Element table : doc.select("table#currentValuationTable.r_table1.text2")) {
        for (Element row : table.select("tr")) {
            Elements tds = row.select("td");
            for (int i = 0; i < tds.size(); i++) {
                System.out.println(tds.get(i).text());
            }
        }
    }        
} 
catch (IOException ex) {
    ex.printStackTrace();
}
//Attempt 3
try {
    Document doc = Jsoup.connect(s).get(); 
    Elements tableElements = doc.select("table#currentValuationTable.r_table1.text2");

    Elements tableRowElements = tableElements.select(":not(thead) tr");

    for (int i = 0; i < tableRowElements.size(); i++) {
        Element row = tableRowElements.get(i);
        System.out.println("row");
        Elements rowItems = row.select("td");
        for (int j = 0; j < rowItems.size(); j++) {
            System.out.println(rowItems.get(j).text());
        }
    }        
} 
catch (IOException ex) {
    ex.printStackTrace();
}
Community
  • 1
  • 1
  • What do your attempts achieve / fail to achieve? – lfurini Jun 14 '15 at 15:38
  • 2
    Print what `Document` was able to read from page (use `System.out.println(doc);`). Something tells me that your problem may be related with fact that HTML content you are looking for is dynamically added by JavaScript by browser, which Jsoup can't do since it doesn't have JavaScript support. In that case you should use more powerful tool like web driver (like Selenium). – Pshemo Jun 14 '15 at 15:39
  • 1
    Try to disable JavaScript and see if you can see tables in browser... – Pshemo Jun 14 '15 at 15:46
  • @Ifurnini My attempts produce nothing. The output says "run: BUILD SUCCESSFUL (total time: 1 second)" –  Jun 14 '15 at 18:25
  • @Pshemo not sure how to disable the JavaScript. Will need sometime to google that. Also, I printed out what the document said in each of the attempts. It's a lot of, what looks to me, HTML. Is there something specific I should be looking for? –  Jun 14 '15 at 18:27
  • I mean disable JavaScript in browser you are using. For instance in chrome: https://www.youtube.com/watch?v=i3mQxhcxu8Y – Pshemo Jun 14 '15 at 18:37
  • I disabled it, refreshed the page and then the content disappeared. ...So, your recommendation is to explore Selenium (or a tool like it)? BTW, thanks for the input. –  Jun 14 '15 at 18:47
  • Yes. Jsoup is simply HTML parser, not browser emulator and it doesn't support JavaScript. You need other tool like Selenium web driver. – Pshemo Jun 14 '15 at 18:52
  • Understood. Thanks again. –  Jun 14 '15 at 18:59

1 Answers1

0

Answer provided by Psherno:

Print what Document was able to read from page (use System.out.println(doc);). Something tells me that your problem may be related with fact that HTML content you are looking for is dynamically added by JavaScript by browser, which Jsoup can't do since it doesn't have JavaScript support. In that case you should use more powerful tool like web driver (like Selenium).

Stephan
  • 41,764
  • 65
  • 238
  • 329