1

I have an old Java program that used to get data from an html page, worked fines few years ago, now when I run it, there is no data. The page link is :

http://www.batstrading.com/book/ibm/

I can still see the html table got from my Java program, but there is no data, but if you use a browser to get to that page, you can see data dynamically changing, why ?

The html text I now get with my Java program from the page is like the text you can see from the browser's view source, looks like this :

    <tbody>
      <tr>
        <td class="shares">&nbsp;</td>
        <td class="price">&nbsp;</td>
      </tr>

Instead of data, it is showing &nbsp;

How to fix my code to get the data ? What I mean is : there is nothing wrong with the Java program, it's getting the text just like the browser's view source, you don't see the data, because the page is now dynamic, so how to use Java to get data from a dynamic page is the question.

Trying Tobemyself
  • 3,668
  • 3
  • 28
  • 43
Frank
  • 30,590
  • 58
  • 161
  • 244
  • possible duplicate of [Apache HttpClient 4 And JavaScript](http://stackoverflow.com/questions/7260282/apache-httpclient-4-and-javascript) – Luiggi Mendoza Jul 11 '13 at 15:47
  • The page is running javascript. Use a tool like firebug to analyze the request that is being sent and simulate it from your java application. – Sotirios Delimanolis Jul 11 '13 at 15:48
  • possible duplicate of [Evaluate javascript on a local html file (without browser)](http://stackoverflow.com/questions/16375251/evaluate-javascript-on-a-local-html-file-without-browser) – Matt Ball Jul 11 '13 at 15:48

2 Answers2

2

Scrap the current approach since the site is updated via Javascript. You won't be able to just download the HTML and make it work.

However, a much easier approach (than using Selenium or a JS engine) would be to simply request the source data that the Javascript is using to update the page:

http://www.batstrading.com/json/bzx/book/IBM

It's perfectly valid JSON. Request that link with your HTTP client and parse the JSON using Jackson. This will yield very reliable results.

Disclaimer You need to make sure that what you are doing complies with the Terms of Service on the website you are using. Otherwise you subject yourself to legal issues.

Colin M
  • 13,010
  • 3
  • 38
  • 58
  • Personally I feel that learning to use powerful tools that will work in every situation is a better solution than assuming other sites will be as nice as this, but if this is really the limit it's probably a better approach for simplicity's sake. – Slater Victoroff Jul 11 '13 at 15:54
  • @SlaterTyranus I believe in using the right tool for the job. In this particular job, Selenium is overkill. But yes, it's a phenomenal tool for other cases (such as QA testing, or screen scraping sites without such friendly JSON) – Colin M Jul 11 '13 at 15:57
0

You can't do this by directly downloading the page, you've got two options here. Personally I would use Casperjs or Selenium to interact with the javascript on the page. Otherwise you have to manually simulate what the javascript is doing, which is in general not very long-lasting or scalable (read: it will break once they change anything about their site).

These tools will emulate a browser and let you wait until certain elements load.

There are a number of other of these kinds of web browsers, but I would highly recommend Casper since it's fast and easy to use and call even from within your Java script since it's just Javascript. See this for instructions on calling javascript from java.

Slater Victoroff
  • 21,376
  • 21
  • 85
  • 144