-1

I am using Delphi 10.

I try to get the content of this website : leforem.be. I attempted using a WebBrowser control but was not able to get the full source which is generated by a Script on the page. Someone has an idea ?

PLink := 'https://www.leforem.be/recherche-offres-emploi/jsp/index.jsp#searchurl-results/1?query=&lieu_trav='
MyBrowser.Navigate(PLink, 4);

{Wait for Browser Ready Status}
while MyBrowser.ReadyState <> 4 do Application.ProcessMessages;
StartTime := Now;

{Wait for anoyther 60 seconds}
while SecondsBetween(Now, StartTime) < 60 do Application.ProcessMessages;

{Get the content of the Browser}
document := MyBrowser.Document as IHTMLDocument2;
PBrut := document.body.innerHTML;
David K.
  • 39
  • 9
  • You aren't using XE10, there is no such thing. Version probably not important here, but if you do quote it, you may as well be accurate. – David Heffernan Jan 23 '19 at 19:29
  • 1
    Indy doesn't execute client-side scripts when retrieving webpages, you would have to do that yourself, which is not trivial. As for a WebBrowser, there is no way to detect when it executes scripts, let alone when it is finished executing them. If a client-side script takes awhile to run, you will just have to add some delays into your code to wait a period of time before accessing the browser's content. Or prompt the user to notify your app when the browser is ready. – Remy Lebeau Jan 23 '19 at 19:42
  • You could have a script that runs forever too, or one script calling another script, and thus never complete. If we knew what you are actually trying to do with that content, we may be able to help you better. – Jerry Dodge Jan 23 '19 at 22:49

2 Answers2

1

Short answer is: there is no all scripts completed event, so it's not possible.

However, if you are looking for a solution and not for a short not possible, consider this:

Indy TIdHTTP does not handle JS scripts at all and it should not do it. Its functionality is to perform (GET, POST, ...) HTTP requests.

Contrary, browsers have built-in JS engine to handle client side scripts. The problem is that they can run continuously, well, even with some pauses. Browsers have just DOM is loaded event. Many websites have attached code to this event to execute later JS code.

The majority of websites runs a series of DOM transformation client scripts after DOM is ready event and after this we may somehow consider that the page is ready to be read by the real human users or web scrapers.

To catch this state there are some approaches to consider:

  • A timer. Simplest but not the best, load the page and wait some time. Consider here network problems, or page changes, that later may take more or less time. Sometimes excessive waiting waste execution time.
  • A periodic DOM element/property check. Sometimes scripts add some properties or elements when the needed state is reached. Analyze your ready loaded website.
  • Busy or ReadyState. TWebBrowser or OLE B := CreateOleObject('InternetExplorer.Application'); have Busy and ReadyState properties. You may check if it is not Busy for some time, consider it complete.
  • An intelligent combination of the ways mentioned above. For example Browser.Busy with a Timeout may do the trick. If the site is specific and one, an element lookup may work. This is preferred way to go.

Considering this you may define your own function NavigateAndWaitComplete(URL, [Element], Timeout) that will do the magic.

Marcodor
  • 4,578
  • 1
  • 20
  • 24
  • Thanks for this. Even with a timer waiting up to a minute, I am still not getting the content of the webpage... – David K. Jan 24 '19 at 18:32
0

Finally, I found the solution. Delphi uses a IE7 emulation. Had to adjust the WebBrowser Component to IE11, and it worked fine.

David K.
  • 39
  • 9
  • on windows better use OLE Object `InternetExplorer.Application` instead of TWebBrowser. It point to last version, did not take into account registry hacks for emulation. – Marcodor Jan 25 '19 at 08:23
  • Marcodor. Thanks for the hint. – David K. Jan 25 '19 at 17:22