1

I want to extract data from a page with the DOMCrawler of Symfony2. This is the page where I want to get data from: http://kovv.mavari.be/kalender.aspx

But I want it after a post, when you click on 'zoek' (no parameters in dropdowns), that's the page I want! Now I have : $html = file_get_contents("http://kovv.mavari.be/kalender.aspx");

But obviously it will load the first page without a post. Is there a way that I can load the page with a post? or do I need to save the page to my local drive first?

UPDATE:
This is my code now:

$post = http_build_query(array(
    'ctl00$ContentPlaceHolder1$ddlGeslacht' => 'Heren',
    'ctl00$ContentPlaceHolder1$ddlReeks' => '',
    'ctl00_ContentPlaceHolder1_ddlDatum' => ''
));

$options= array('http' => array(
    'method'  => 'POST',
    'header'  => 'Content-type: application/x-www-form-urlencoded',
    'content' => $post
));

$context  = stream_context_create($options);
$html = file_get_contents('http://kovv.mavari.be/kalender.aspx', false, $context);

But the html is still not changed, it's still the first page without post..

UPDATE 2: This is what I have now:

$url = "http://kovv.mavari.be/kalender.aspx";
$regs=array();

$cookies = '../src/VolleyScout/VolleyScoutBundle/Resources/doc/cookie.txt';

// regular expressions to parse out the special ASP.NET
// values for __VIEWSTATE and __EVENTVALIDATION
$regexViewstate = '/__VIEWSTATE\" value=\"(.*)\"/i';
$regexEventVal  = '/__EVENTVALIDATION\" value=\"(.*)\"/i';

$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
$data=curl_exec($ch);

$viewstate = $this->regexExtract($data,$regexViewstate,$regs,1);
$eventval = $this->regexExtract($data, $regexEventVal,$regs,1);

$postData = '__VIEWSTATE='.rawurlencode($viewstate)
    .'&__EVENTVALIDATION='.rawurlencode($eventval)
    .'&ctl00_ContentPlaceHolder1_ddlGeslacht=Heren'
    .'&ctl00$ContentPlaceHolder1$ddlReeks'
    .'&ctl00_ContentPlaceHolder1_ddlDatum'
    .'&ctl00$ContentPlaceHolder1$btnZoek:zoek'
;

curl_setOpt($ch, CURLOPT_POST, TRUE);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postData);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookies);

curl_setOpt($ch, CURLOPT_POST, FALSE);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookies);

$data = curl_exec($ch);

echo $data;

curl_close($ch);

But I still get the page without a post, am I missing something?

nielsv
  • 6,540
  • 35
  • 111
  • 215
  • Change the form for `zoek` to your own script and do a curl from there to get the search results from their site. – Joran Den Houting Jan 14 '14 at 11:01
  • Maybe you need to add the button to the post as well (`ctl00$ContentPlaceHolder1$btnZoek:zoek`)? Beside the obvious, the page submits a lot of information in the request (`__VIEWSTATE ` ,`__EVENTVALIDATION `), just look at it in a browser. Maybe there's something more than meets the eye… – nietonfir Jan 14 '14 at 11:38
  • Tried to add the btn but still no result :( – nielsv Jan 14 '14 at 12:01

1 Answers1

2

You have to use the context param of file_get_contents and pass an stream context object to send an post request.

$post = http_build_query(array(
    'ctl00$ContentPlaceHolder1$ddlGeslacht' => '...',
    'ctl00$ContentPlaceHolder1$ddlReeks' => '...',
    // ...
));

$options= array('http' => array(
    'method'  => 'POST',
    'header'  => 'Content-type: application/x-www-form-urlencoded',
    'content' => $post
));

$context  = stream_context_create($options);
file_get_contents('http://kovv.mavari.be/kalender.aspx', false, $context);
Philipp
  • 15,377
  • 4
  • 35
  • 52
  • Updated my begin post, changed it to your answer but still getting first page before post.. – nielsv Jan 14 '14 at 11:29
  • You have to send `__EVENTVALIDATION` and `__VIEWSTATE` as well.. This params are nedded by asp.net - probably you have to send two requests and extract this two hidden fields from the first one to create a valid post request.. I tried it myself and it works – Philipp Jan 14 '14 at 13:37
  • Could you give some more explenation about the __EVENTVALIDATION and __VIEWSTATE? I don't really know how to implement these .. – nielsv Jan 14 '14 at 13:57
  • you cant implement them by yourself - they are server generated and sort of secured.. to generate them, you have to send a get request to `http://kovv.mavari.be/kalender.aspx` and use i.e. a regex to crawl the content of these two fields. – Philipp Jan 14 '14 at 14:45
  • You said you tried it yourself and it works, could you paste the code in your answer because I'm really stuck at the __EVENTVALIDATION and __VIEWSTATE. – nielsv Jan 15 '14 at 11:58
  • Updated my begin post, am I missing something? – nielsv Jan 15 '14 at 13:09