0

I am using jsoup to extract data from a table in a website.http://www.moneycontrol.com/stocks/marketstats/gainerloser.php?optex=BSE&opttopic=topgainers&index=-1 using Jsoup. I have referred to Using JSoup To Extract HTML Table Contents and other similar questions but it does not print the data. Could someone please provide me with the code required to achieve this?

public class TestClass
 {


public static void main(String args[]) throws IOException
{
Document doc = Jsoup.connect("http://www.moneycontrol.com/stocks/marketstats/gainerloser.php?optex=BSE&opttopic=topgainers&index=-1").get();

    for (Element table : doc.select("table.tablehead")) {
        for (Element row : table.select("tr")) {
            Elements tds = row.select("td");
            if (tds.size() > 6) {
                System.out.println(tds.get(0).text() + ":" + tds.get(1).text());
            }
        }
    }
Community
  • 1
  • 1
user1092042
  • 1,297
  • 5
  • 24
  • 44
  • 1
    It would probably helpful to see your code in order to help you... – quaylar Feb 20 '12 at 09:30
  • The [Terms of Use](http://www.moneycontrol.com/cdata/termsofuse.php) suggest that such acts are not permitted without the express written permission of moneycontrol.com. If you have their permission, ask them about the preferred API (organized by them) for accessing the data. E.G. I notice one of the links mentions RSS feeds. That is a much more 'machine friendly' form of information than HTML. – Andrew Thompson Feb 20 '12 at 09:39
  • I would like to get the names of the top gainers in the table. I have to tweak the code a little but dont know what i have to do exactly as i am new to jsoup. – user1092042 Feb 20 '12 at 13:00

2 Answers2

1

If you want to get the content of table(not head), you need change the selector of table:

for (Element table : doc.select("table.tbldata14"))

instead of

 for (Element table : doc.select("table.tablehead"))
vacuum
  • 2,273
  • 3
  • 20
  • 32
0

One important thing is to check what are you getting in Doc when you parse the HTML because there might be few problems with it like: 1. The Site might be using iframes to display content 2. Display content via Javascript 3. few sites have scripts which does not allow jsoup parsing, hence the doc element will contain random data

Rahul
  • 67
  • 2
  • 8