php file_get_contents strange characters

Question

As normally I would use fil_get_contents to get the html structure of a certain page but with a particular site I have tried all I get instead of the html structure is characters like these:

J��t��`$ؐ@��iG#)�*��eVe]f@�흼

Does anyone have any idea what it might be? I am wondering the site has a protective system that detects whether a request is made by a real user or a php script and in the second case it displays this.

I have used curl to get the page and specified browser agent but I guess I should take it further by using curl cookies or more....

the function I use (curl version):

function getPage($url) {

    $proxies = array();
    $proxies[] = 'proxies here';


    if (isset($proxies)) {
        $proxy = $proxies[array_rand($proxies)];
    }


    $ch = curl_init();

    $header = array(
        'User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12',
        'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
        'Accept-Language: en-us,en;q=0.5',
        'Accept-Encoding: gzip,deflate',
        'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7',
        'Keep-Alive: 115',
        'Connection: keep-alive');

    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookies.txt');
    curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookies.txt');
    curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
    $result = curl_exec($ch);

    return $result;
    curl_close($ch);
}

Any help will be greatly appreciated.

Can you show us the code that you're using to get the content? — Mathlight, Jul 02 '14 at 11:12
[Mastering UTF-8 encoding in PHP and MySQL](http://blog.flowl.info/2014/mastering-unicodeutf-8-encoding-php/) — Daniel W., Jul 02 '14 at 11:13
it would help a lot to see what particular site you are talking about. — davidkonrad, Jul 02 '14 at 11:15
The website is in english and I dont see any charset speficication on the html structure — inrob, Jul 02 '14 at 11:18

score -1 · Accepted Answer · answered Jul 02 '14 at 11:14

-1

You're dealing with character encoding the iconv function may help you.

answered Jul 02 '14 at 11:14

Vlas Bashynskyi

1,886
2
16
25

php file_get_contents strange characters

1 Answers1