Using Jsoup to extract data

Question

I am using jsoup to extract data from a table in a website.http://www.moneycontrol.com/stocks/marketstats/gainerloser.php?optex=BSE&opttopic=topgainers&index=-1 using Jsoup. I have referred to Using JSoup To Extract HTML Table Contents and other similar questions but it does not print the data. Could someone please provide me with the code required to achieve this?

public class TestClass
 {


public static void main(String args[]) throws IOException
{
Document doc = Jsoup.connect("http://www.moneycontrol.com/stocks/marketstats/gainerloser.php?optex=BSE&opttopic=topgainers&index=-1").get();

    for (Element table : doc.select("table.tablehead")) {
        for (Element row : table.select("tr")) {
            Elements tds = row.select("td");
            if (tds.size() > 6) {
                System.out.println(tds.get(0).text() + ":" + tds.get(1).text());
            }
        }
    }

It would probably helpful to see your code in order to help you... — quaylar, Feb 20 '12 at 09:30
The [Terms of Use](http://www.moneycontrol.com/cdata/termsofuse.php) suggest that such acts are not permitted without the express written permission of moneycontrol.com. If you have their permission, ask them about the preferred API (organized by them) for accessing the data. E.G. I notice one of the links mentions RSS feeds. That is a much more 'machine friendly' form of information than HTML. — Andrew Thompson, Feb 20 '12 at 09:39
I would like to get the names of the top gainers in the table. I have to tweak the code a little but dont know what i have to do exactly as i am new to jsoup. — user1092042, Feb 20 '12 at 13:00

score 1 · Accepted Answer · answered Feb 20 '12 at 20:00

1

If you want to get the content of table(not head), you need change the selector of table:

for (Element table : doc.select("table.tbldata14"))

instead of

 for (Element table : doc.select("table.tablehead"))

answered Feb 20 '12 at 20:00

vacuum

2,273
3
20
32

It gives me an index out of bound exception and says size of the array list is 0. – user1092042 Feb 21 '12 at 04:14
Hmm, strange. Recheck your code and try `"table.bdrtpg"` in selector string. – vacuum Feb 21 '12 at 08:15
Youcan also try `doc.select("div.FL")` – vacuum Feb 21 '12 at 08:18

score 0 · Answer 2 · answered Jun 14 '13 at 06:00

One important thing is to check what are you getting in Doc when you parse the HTML because there might be few problems with it like: 1. The Site might be using iframes to display content 2. Display content via Javascript 3. few sites have scripts which does not allow jsoup parsing, hence the doc element will contain random data

Using Jsoup to extract data

2 Answers2

Linked