1

I am trying to write program in PHP, which would download and parse html source. Problem is, when I try to download html, which is generated by js.
Is there any chance to download the file after function onload() is completed?

Thanks

  • share some code with what you are trying to do – karthikr Jun 11 '13 at 17:03
  • Are you trying to do this yourself to get test cases or in code? – BlargleMonster Jun 11 '13 at 17:04
  • 1
    So you are trying to build a scraper with PHP client and are hoping PHP can somehow run javascript? – Mike Brant Jun 11 '13 at 17:05
  • No. `onload` and other JavaScript functions are executed by the browser and operates in memory - no other source is generated, just changes in DOM tree. – dev-null-dweller Jun 11 '13 at 17:05
  • well there is page http://www.fler.cz/zbozi/moda/pradlo and the list of products is generated on page load. What I need is get html code after the list is generated. – user2475460 Jun 11 '13 at 18:00
  • When we navigate to that url, the browser runs JavaScript, which appends html elements to the DOM. Are you writing your script within the context of that website (in other words, when a user navigates to that site in their browser)? Or do you want to write a script that can be run from, say, a command prompt or from within an application running on a computer or server (not in a web browser)? – benastan Jun 11 '13 at 18:13
  • Yes I am writing application which will run on different server (that is why php). And I have no access to fler.cz. – user2475460 Jun 11 '13 at 18:55

1 Answers1

3

Not a trivial task as Javascript is actual active code that has to be interpreted by the browser. What you get from the server is the actual HTML, and all the things that javascript does is on the client side and is completely out of the hands of the server who is giving you the Web page. You can't solve this in general with static analysis (ie guessing what will happen by looking at the code without actually executing it). The only way to reliably do this is to actually execute the javascript.

That being said, you probably don't want to write your own javascript interpreter from scratch. There are "headless" implementations out there that have a javascript interpreter just like a browser does but doesn't display it on a screen - it does all the operations on a virtual DOM. Try looking into PhantomJS.

EDIT See this question where someone basically does what you are asking.I think it should work as-is for your case.

I don't know of any "pure" php solutions, but you could easily automate running the script with php. If you need to stay PHP due to any reason then "headless DOM renderer" is what I would search for.

Community
  • 1
  • 1
Justin L.
  • 13,510
  • 5
  • 48
  • 83