As normally I would use fil_get_contents to get the html structure of a certain page but with a particular site I have tried all I get instead of the html structure is characters like these:
J��t��`$ؐ@������iG#)�*��eVe]f@�흼
Does anyone have any idea what it might be? I am wondering the site has a protective system that detects whether a request is made by a real user or a php script and in the second case it displays this.
I have used curl to get the page and specified browser agent but I guess I should take it further by using curl cookies or more....
the function I use (curl version):
function getPage($url) {
$proxies = array();
$proxies[] = 'proxies here';
if (isset($proxies)) {
$proxy = $proxies[array_rand($proxies)];
}
$ch = curl_init();
$header = array(
'User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12',
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language: en-us,en;q=0.5',
'Accept-Encoding: gzip,deflate',
'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7',
'Keep-Alive: 115',
'Connection: keep-alive');
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookies.txt');
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookies.txt');
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
$result = curl_exec($ch);
return $result;
curl_close($ch);
}
Any help will be greatly appreciated.