15

I'm fetching pages with cURL in PHP. Everything works fine, but I'm fetching some parts of the page that are calculated with JavaScript a fraction after the page is loaded. cURL already send the page's source back to my PHP script before the JavaScript calculations are done, thus resulting in wrong end-results. The calculations on the site are fetched by AJAX, so I can't reproduce that calculation in an easy way. Also I have no access to the target-page's code, so I can't tweak that target-page to fit my (cURL) fetching needs.

Is there any way I can tell cURL to wait until all dynamic traffic is finished? It might be tricky, due to some JavaScripts that are keep sending data back to another domain that might result in long hangs. But at least I can test then if I at least get the correct results back.

My Developer toolbar in Safari indicates the page is done in about 1.57s. Maybe I can tell cURL statically to wait for 2 seconds too?

I wonder what the possibilities are :)

4 Answers4

12

With Peter's advise and some research. It's late but I have found a solution. Hope someone find it helpful.

All you need to do is request the ajax call directly. First, load the page that you want to get in chrome, go to Network tab, filter XHR.

Now you have to find the ajax call that you want. Check the response to verify it.

Right click on the name of the ajax call, select copy -> "copy as Curl (bash)"

Network tabs

Go to https://reqbin.com/curl, paste the Curl and click Run. Check the response content.

Reqbin

If it's what you want then move to the next step.

Still in reqbin window, click Generate code and choose the language that you want it to be translated and you will get the desired code. Now intergrated to your code however you want.

Some tips: if test run on your own server return 400 error or nothing at all: Set POSTFIELDS to empty. If it return 301 permanently moved, check your url whether it's https or not.

NhanVo
  • 141
  • 1
  • 4
  • https://reqbin.com/ sends your command to their server to convert it, which would allow them to save any cookies or other sensitive data in your command. – Boris Verkhovskiy Sep 07 '22 at 09:20
6

cURL does not execute any JavaScript or download any files referenced in the document. So cURL is not the solution for your problem.

You'll have to use a browser on the server side, tell it to load the page, wait for X seconds and then ask it to give you the HTML.

Look at: http://phantomjs.org/ (you'll need to use node.js, I'm not aware of any PHP solutions).

Jan Hančič
  • 53,269
  • 16
  • 95
  • 99
  • Luckily it's just a small piece of code. I'll rewrite the code in JavaScript and fetch the data with jQuery and PhantomJS then. Thank you :) –  Jan 31 '13 at 13:00
  • Is there any way to include PhantomJS just plainly in my local HTML-page where I do my jQuery? –  Jan 31 '13 at 13:24
  • No. phantom.js uses a real webkit browser internally, which you can't do on the client. – Jan Hančič Jan 31 '13 at 13:26
3

Not knowing a lot about the page you are retrieving or the calculations you want to include, but it could be an option to cURL straight to the URL serving those ajax requests. Use something like Firebug to inspect the Ajax calls being made on your target page and you can figure out the URL and any parameters passed. If you do need the full web page, maybe you can cURL both the web page and the Ajax URL and combine the two in your PHP code, but then it starts to get messy.

Peter Herdenborg
  • 5,736
  • 1
  • 20
  • 21
2

There is one quite tricky way to achieve it using php. If you' really like it to work for php you could potentially use Codeception setup in junction with Selenium and use Chrome browser webdriver in headless mode.

Here are some general steps to have it working.

  1. You make sure you have codeception in your PHP project https://codeception.com

  2. Download chrome webdriver: https://chromedriver.chromium.org/downloads

  3. Download selenium: https://www.seleniumhq.org/download/

  4. Configure it accordingly looking into documentation of codeception framework.

  5. Write codeception test where you can use expression like $I->wait(5) for waiting 5 seconds or $I->waitForJs('js expression here') for waiting for js script to complete on the page.

  6. Run written in previous step test using command php vendor/bin/codecept path/to/test

K. Igor
  • 309
  • 4
  • 8