1

I want to extract data from a page with the DOMCrawler of Symfony 2. This is the page where I want to get data from: http://kovv.mavari.be/kalender.aspx

But I want it after a post, when you click on 'zoek' (no parameters in dropdowns). That's the page I want! At first I had: $html = file_get_contents("http://kovv.mavari.be/kalender.aspx");. But obviously that will only load the first page without a post.

This is what I have now:

$post = http_build_query(array(
            'ctl00_ContentPlaceHolder1_ddlGeslacht' => 'Heren',
            'ctl00$ContentPlaceHolder1$ddlReeks'    => '',
            'ctl00_ContentPlaceHolder1_ddlDatum'    => '',
            'ctl00$ContentPlaceHolder1$btnZoek:zoek'
));

$options= array('http' => array(
    'method'  => 'POST',
    'header'  => 'Content-type: application/x-www-form-urlencoded',
    'content' => $post
));

$context  = stream_context_create($options);

$html = file_get_contents('http://kovv.mavari.be/kalender.aspx', false, $context);

But from my other Stack Overflow topic, I learned that I also have to send the __EVENTVALIDATION and __VIEWSTATE as well. But I have no idea to get them. How can I fix this problem? (Some key words for searching on Google or so would also be great!)

This is what I have now:

$url = "http://kovv.mavari.be/kalender.aspx";
$regs = array();

$cookies = '../src/VolleyScout/VolleyScoutBundle/Resources/doc/cookie.txt';

// Regular expressions to parse out the special ASP.NET
// values for __VIEWSTATE and __EVENTVALIDATION
$regexViewstate = '/__VIEWSTATE\" value=\"(.*)\"/i';
$regexEventVal  = '/__EVENTVALIDATION\" value=\"(.*)\"/i';

$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
$data=curl_exec($ch);

$viewstate = $this->regexExtract($data,$regexViewstate,$regs,1);
$eventval = $this->regexExtract($data, $regexEventVal,$regs,1);

$postData = '__VIEWSTATE='
            . rawurlencode($viewstate)
            . '&__EVENTVALIDATION='.rawurlencode($eventval)
            . '&ctl00_ContentPlaceHolder1_ddlGeslacht=Heren'
            . '&ctl00$ContentPlaceHolder1$ddlReeks'
            . '&ctl00_ContentPlaceHolder1_ddlDatum'
            . '&ctl00$ContentPlaceHolder1$btnZoek:zoek'
;

curl_setOpt($ch, CURLOPT_POST, TRUE);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postData);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookies);

curl_setOpt($ch, CURLOPT_POST, FALSE);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookies);

$data = curl_exec($ch);

echo $data;

curl_close($ch);

But I still get the page without a post, am I missing something?

Community
  • 1
  • 1
nielsv
  • 6,540
  • 35
  • 111
  • 215

1 Answers1

2

HTTP is a stateless protocol which means the client and server have no built in way of tracking the state of the application from one request to the next. Various technologies have been invented to circumvent this such as cookies. ViewState and event validation are two techniques used by ASP.NET to give a state-full feel to a web page.

Please refer this link for more information.

Pawan
  • 1,065
  • 5
  • 10