3

How can i extract the HTML table's data form a webpage without downloading the whole of webpage HTML?

I use TWebBrowser and TEmbededWB in Delphi XE2 for loading the page and then get the Table element and parse it. but the webpage is very heavy and in my loop (about 60 Sec) I can't grab the data correctly.

Regards

SadeghAlavizadeh
  • 609
  • 3
  • 17
  • 33
  • 4
    You can download it via a stream, and discard the bits you don't need, which will be more efficient than loading the whole thing into memory. However, there is no way just to request specific bits unless a server supports it, which generally web servers don't. – mellamokb Aug 08 '12 at 15:02
  • 3
    And even then, you need to know where the start and end bytes for the table are. – Quentin Aug 08 '12 at 15:04
  • how can I download it via stream? and how can I parse it? Is it htmldocument that I can parse it with MSHTML or not? – SadeghAlavizadeh Aug 08 '12 at 15:47
  • 3
    Does your end user need to also see the web page? If not, use the Indy HTTP Client (TIdHttp) component and do a get or post. That will get it to you in stream form without the overhead of the browser parsing and rendering the HTML. – Sam M Aug 08 '12 at 18:29
  • Synapse also nice HTTP library. Your question is not clear. What do you call webpage ? only root HMTL page ? or hundred of files, html and css styles and js programs and flash advertisements and html sub-frames and music and video and who knows what more ? HTTP protocol allows you to download only part of file, from byte #12345 to byte #54321, but 1: not all servers allow that and 2: how can you now which bytes you need without downloading the page ? So you still would have to download the main HMTL file. But you can avoid downloading all or most of aux files – Arioch 'The Aug 09 '12 at 12:31
  • you also may ask server to send you page ZIPped or GZIPped if server knows how to pack and you program knows how to unpack, reducing traffic a bit. But if you table would be not in main fail, but in some aux file, you would have to download it instead, and calculate its address. There are many libraries to download file via HTTP. Indy, Synapse, Microsoft BITS, JediCodeLib HTTP Grabber, a lot. Then you would have to get some HTML parser (XML parsers would parse valid XHTML pages, but may deny to parse HTML page or broken realworld XHTML page). U can search for HTML parsers here or on torry.net – Arioch 'The Aug 09 '12 at 12:34
  • thanks, I asked another question. [http://stackoverflow.com/questions/11915903/convert-string-form-idhttp-get-to-ihtmldocument2-in-delphi#comment15865758_11915903] – SadeghAlavizadeh Aug 11 '12 at 15:38

1 Answers1

1

Man,

As the language of HTML protocol is interpreted and not compiled, the browser or any class implementation that works like a browser needs to download the entire content from server. You can understand more about it reading How Browsers Work and Surfin' Safari. I believe that there is not an effective method to do what you asked, though, in my opinion a very efficient method would be to do as mellamokb said. But, it still will downloading the entire content.

[]'s

Community
  • 1
  • 1
Rodrigo Reis
  • 1,097
  • 12
  • 21