1

I am using the TIdHTTP component to fetch web pages. Works fine for the main page. But it does not retrieve content generated by embedded javascript code. A good example are the pages which allow users to add comments via disqus.

Here is an example

In the scenario the TIdHTTP.Get method does not get the comments down on the bottom of the page.

Is there anyway this could be done with the Indy component or any other native component?

I have experimented using TWebBrowser OLE control. But I prefer to use native delphi code.

M Schenkel
  • 6,294
  • 12
  • 62
  • 107
  • 3
    You appear to have forgotten what you learned two months ago when you asked, "[Can Indy run Javascript?](http://stackoverflow.com/questions/2249880/can-indy-run-javascript)" – Rob Kennedy Apr 11 '11 at 22:28
  • @RobKennedy - good observation (but it was actually 1 year and 2 months ago). Ironically I more or less asked the same question but the context was totally different. In this case I am looking to have all java script execute (on its own) to render the entire page as if it were in a browser. In that other case I was looking to interact with a Flash component and have it fire certain methods. In any event sorry for the duplicate question. – M Schenkel May 29 '11 at 03:29

2 Answers2

3

IdHTTP will not execute JavaScripts, as IdHTTP is NOT a browser. You would need to introduce a JavaScript executor to your application to execute the scripts from the retrieved HTML source as a browser would.

Your example is about displaying google analytics stats... is this what you're trying to do in your application? If so, you're being foolish (without meaning to be offensive) trying to hack it together using a pre-rendered result.

Google Analytics provides an API specifically so you can harvest information using HTTP calls. Once that information is retrieved, you can then display it using native Delphi components and code in any imaginative or original way you desire. Because GA provides an API, there's no excusable reason to do it the way you appear to be attempting.

LaKraven
  • 5,804
  • 2
  • 23
  • 49
  • Thank - I kind of figured I would then need something to parse/execute the javascript. I will start searching for such a component. – M Schenkel Apr 11 '11 at 20:30
  • Regarding your comments about "scraping" Google Analytics. No - that is not what I am trying to do. Ironcially I offer a suite of products to do exactly this (click on my name and follow link). One of my services (InboundLinkAlerts) is based on the GA API. It queries for new links to your site, and then I want to Search these pages to verify the link exists. I am using links to my own site as a test (www.embeddedanalytics.com). And I found it did not pick up my comments. – M Schenkel Apr 11 '11 at 20:33
  • Glad to hear you're not trying to perform "scraping". Sorry if my answer seemed a little "harsh", though no self-respecting developer could abide the idea of scraping pages in that way. – LaKraven Apr 11 '11 at 20:37
  • also... if you're not scraping stats, what exactly are you trying to do that would require the execution of JavaScripts in returned HTML content? – LaKraven Apr 11 '11 at 20:37
  • As I said, the product "InboundLinkAlerts" verifies a link on a page to your page exists. If you look at the example page I have a comment and link to my site www.embeddedanalytics.com. But simply doing a idhttp.get does not get the content that is generated by the disqus javascript code. So I have no way of verifying the link exists. – M Schenkel Apr 11 '11 at 20:41
  • Ah okay... I understand you wanting to avoid the use of TWebBrowser (OLE objects are a royal pain)... still, if you are forced to fall-back on that, and if you need it, I have a fully-functional routine for extracting all hyperlinks from a target page (using TWebBrowser). Personally I know of only one JavaScript suite for Delphi, though I haven't used it in ~18 months. I'll find it and send you a link to it (if it's still an active project)... a moment please... – LaKraven Apr 11 '11 at 20:48
  • Okay, so I can no longer find the specific library I was using... however I did turn up this: http://code.google.com/p/extpascal/ – LaKraven Apr 11 '11 at 20:51
  • LaKraven - you said you had a "fully-functional routine for extracting all hyperlinks from a target page (using TWebBrowser)." I am still searching for a solution. Is this something you could provide me? Would it extract links out of the javascript generated disqus comments too? – M Schenkel May 29 '11 at 03:35
  • Sorry for the late reply: I can make this code available, certainly... I've no idea if it'll take links from a JS-generated portion of a page, though... that would warrant some testing. I'll get back to you once I've logged today's billable hours for one of my clients. Feel free to add me on Skype, by the way (same name as my SO username) – LaKraven May 31 '11 at 07:37
2

No, of course this doesn't work. The Get function simply obtains the (HTML) text returned by the web server. It doesn't even know what type of text that is returned. It could be a HTML page, a plain-text file, or some completely unknown sequence of bytes. In the case of a HTML page, therefore, all you get is the plain HTML source, including any client-side scripts. Indeed, the JavaScripts are merely textual content embedded in the HTML code inside <script> tags. It is up to you to execute the script, like a web browser does after it has downloaded the HTML code!

Andreas Rejbrand
  • 105,602
  • 8
  • 282
  • 384
  • I figured. So I guess the next step is to find a "JavaScript" execution component. – M Schenkel Apr 11 '11 at 20:29
  • 1
    See the following questions: "[How can I execute Javascript in my Delphi program without TWebBrowser?](http://stackoverflow.com/questions/4424117/how-can-i-execute-javascript-in-my-delphi-program-without-twebbrowser)" and "[Can Indy run Javascript?](http://stackoverflow.com/questions/2249880/can-indy-run-javascript)" (Answers to the former mention SpiderMonkey [Mozilla] and V8 [Google]; my answer to the latter mentions using the Microsoft Script Control ActiveX object for script execution.) – Rob Kennedy Apr 11 '11 at 22:32