0

I was trying to download a pdf file from ieeexplore via PHP, but it seems not working well. Suppose the URL is http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5534992. I wrote the following PHP code:

function get_web_page($url) {
    $ch  = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_COOKIESESSION, true);
    curl_setopt($ch, CURLOPT_COOKIEJAR, '/tmp/cookie.txt');
    curl_setopt($ch, CURLOPT_COOKIEFILE, '/tmp/cookie.txt');
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_HEADER, 0);        
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    $page = curl_exec($ch);
    curl_close($ch);
    return $page;
}

But this code failed with nothing downloaded. I checked the received http header in the following:

HTTP/1.0 200
Connection established
HTTP/1.1 302 Moved Temporarily
Server: Sun-ONE-Web-Server/6.1
Date: Mon, 09 Jul 2012 22:11:50 GMT
Content-length: 0
Content-type: text/html
Set-Cookie: ERIGHTS=na2vLnqZwz9xxRfO2zN8Ny66f0vHi85YE*ynGx2BtGx2FmIHkiEyx2Bg89Db6Qx3Dx3D-18x2dHeJj2k3B7UHsoix2BefrHXeAx3Dx3Dusln2oQUqj3KXiQXjOYx2BMwx3Dx3D-UQmTydx2FMwnGJOyKUw5iVDAx3Dx3D-eV0zE6ztXYKrVZluJrMMbAx3Dx3D;path=/;domain=.ieee.org
Location: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5534992&tag=1
Set-Cookie: WLSESSION=874668684.20480.0000; expires=Tue, 10-Jul-2012 22:11:48 GMT; path=/

HTTP/1.1 200 OK
Server: Sun-ONE-Web-Server/6.1
Date: Mon, 09 Jul 2012 22:11:50 GMT
Content-length: 203
Content-type: text/html; charset=UTF-8
Cache-Control: private
Product: 254
Inst: 9690
Licenseowner: 9690
Member: 0
Cache-Control: no-cache
Pragma: No-cache
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Set-Cookie: xploreCookies={"standardsLicenseId":"0","openUrl":"http://linkserv.lib.utk.edu:9003/sfx","enterpriseLicenseId":"0","isIp":"true","desktopReportingUrl":"null","openUrlImgLoc":"http://www.lib.utk.edu/eresources/sfx2.gif","products":"IEL|VDE|","contactName":"NA","isChargebackUser":"false","contactEmail":"NA","oldSessionKey":"na2vLnqZwz9xxRfO2zN8Ny66f0vHi85YE*ynGx2BtGx2FmIHkiEyx2Bg89Db6Qx3Dx3D-18x2dHeJj2k3B7UHsoix2BefrHXeAx3Dx3Dusln2oQUqj3KXiQXjOYx2BMwx3Dx3D-UQmTydx2FMwnGJOyKUw5iVDAx3Dx3D-eV0zE6ztXYKrVZluJrMMbAx3Dx3D","userIds":"9690","instImage":"","isInst":"true","isDelegatedAdmin":"false","isMember":"false","instName":"UNIVERSITY OF TENNESSEE","customerSurvey":"NA","smallBusinessLicenseId":"0","openUrlTxt":"NA"}; domain=.ieee.org; path=/
Set-Cookie: JSESSIONID=V6pLP7XH4nvtQYcvmVc1ry1Y51vDHhkG8SGn9y0LG8XJv3k3hmJs!-1711984930; path=/; HttpOnly
X-Powered-By: Servlet/2.5 JSP/2.1

An error has occurred while trying to load your document. Please try again. If you continue to experience issues, please contact Customer Service.2016

However, if you paste the URL in the web browser, you may access the PDF file directly.

Since I was in my University's domain, I did not need to worry about the access license to this PDF file.

Anyone has ideas?

Thanks~

drew010
  • 68,777
  • 11
  • 134
  • 162
little-eyes
  • 945
  • 2
  • 9
  • 14
  • 1
    See if setting a user-agent string (`CURLOPT_USERAGENT`) makes any difference. Often times if a bogus or no user-agent is sent, sites will behave differently or outright reject requests. – drew010 Jul 09 '12 at 22:33
  • Hi @drew010, I tried this user agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.47 Safari/536.11, but it still cannot work... – little-eyes Jul 09 '12 at 22:39
  • Even if nothing is downloaded, something is downloaded. There is no right or wrong in HTTP, there is just request and repsonse. So what *exactly* went wrong? – hakre Jul 09 '12 at 23:07
  • Hi @hakre, this code is supposed to download the pdf file specified by the URL, but it got nothing but headers after I executed the code. – little-eyes Jul 10 '12 at 03:44
  • @little-eyes: Actually there is an answer text given for humans wih in part tells the following: *"please contact Customer Service.2016"* I'd say you should contact customer service. Also if you want to learn more how curl connects and which headers are send, checkout the verbose mode: http://stackoverflow.com/a/9707622/367456 – hakre Jul 10 '12 at 07:59

1 Answers1

1

Test This Code Here past Proxy ip ( if you don't fill it no proxy will be used ) Value

        </tr>
        <tr>
            <td> For connceting to Database paste Its Url</td>
            <td><input type="text" name="ssurl" value="http://www.sciencedirect.com/science/article/pii/S0301421504000928" /></td> /></td>

        </tr>
        <tr>
            <td>Proxy Ip</td>
            <td><input type="text" name="ssproxyip" value="202.202.0.163"/></td>

        </tr>
        <tr>
            <td>Proxyport</td>
            <td><input type="text" name="ssproxyport" value="3128"/></td>

        </tr>
        <tr>
            <td>Proxy Username & password   (username:password)</td>
            <td><input type="text" name="ssproxyusernamepassword"/></td>

        </tr>
    </table>
    <input type="submit" name="ssurlsubmit" value="submit" />
    </form>



</body>
</html>

<?php

/**
 * @author nnnnn
 * @copyright 2012
 */
//removes string from the end of other
if (isset($_POST['ssurlsubmit'])) {
function removeFromEnd($string, $stringToRemove) {
     $stringToRemoveLen = strlen($stringToRemove);
     $stringLen = strlen($string);

     $pos = $stringLen - $stringToRemoveLen;    

     $out = substr($string, 0, $pos);

     return $out;
 }

//$string = 'picture.jpg.jpg';
//$string = removeFromEnd($string, '.jpg');
//$url='http://127.0.0.1/leech/';
//$url='http://www.sciencedirect.com/science/article/pii/S0301421504000928';


global $ssurl,$file_n;
$url=$_POST['ssurl'];
$file_n=$_POST['file_name'];
echo "URL:".$url.'<br />';


//$url = 'http://pdn.sciencedirect.com/science?_ob=MiamiImageURL&_cid=271097&_user=2501846&_pii=S0301421504000928&_check=y&_origin=article&_zone=toolbar&_coverDate=2005--31&view=c&originContentFamily=serial&wchp=dGLbVlB-zSkWb&md5=f51bc09e08b4d3eafb759ef5c08724c4&pid=1-s2.0-S0301421504000928-main.pdf';

 if (isset($_POST['ssproxyip'])) {
$proxyip=$_POST['ssproxyip'];
$proxyoprt=$_POST['ssproxyport'];


}
if (isset($_POST['proxyuserpassword'])) {
$proxyuserpassword=$_POST['ssproxyuserpassword'];

}

$mypath = getcwd();
            $mypath = preg_replace('/\\\\/', '/', $mypath);
            $rand = rand(1, 15000);

            if (!file_exists("$mypath/cookies") and !is_dir("$mypath/cookies")) {
mkdir("$mypath/cookies");
} 

            $cookie_file_path = "$mypath/cookies/cookie$rand.txt";
            echo $cookie_file_path.'<br />';
            echo 'cookie2:  '.$cookie_file_path.'<br/>';
if (! file_exists($cookie_file_path) || ! is_writable($cookie_file_path))
{

 //$fp1 = fopen($cookie_file_path, "w");
  //fclose($fp1);
  }
  if (! file_exists($cookie_file_path) || ! is_writable($cookie_file_path))
  {
    echo 'Cookie file missing or not writable.';
    //exit;
}
if ( ! extension_loaded('curl'))
{
    echo "You need to load/activate the curl extension.";
}


 $ss=substr($url,-4);
 $string = removeFromEnd($url, '.pdf');
 echo "ss:  ".$ss.'<br />'; 
 //$url = 'http://www.sciencedirect.com/science/jrnlallbooks/a/fulltext';
//$proxy = '200.93.148.72:3128';
$ext = substr($fileName, strrpos($fileName, '.') + 1);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
//curl_setopt($ch, CURLOPT_PROXY, "203.64.181.50"); //your proxy url
//curl_setopt($ch, CURLOPT_PROXYPORT, "3128"); // your proxy port number 
//curl_setopt($ch, CURLOPT_PROXYUSERPWD, "bjm:12345"); //username:pass 
//curl_setopt($ch, CURLOPT_PROXY, "202.202.0.163"); //your proxy url
//curl_setopt($ch, CURLOPT_PROXYPORT, "3128"); // your proxy port number 

if (isset($_POST['ssproxyip'])) {
curl_setopt($ch, CURLOPT_PROXY, $proxyip); //your proxy url
curl_setopt($ch, CURLOPT_PROXYPORT, $proxyport); // your proxy port number 

}
if (isset($_POST['proxyuserpassword'])) {
curl_setopt($ch, CURLOPT_PROXYUSERPWD,$proxyuserpassword); //username:pass 
}
curl_setopt($ch, CURLOPT_TIMEOUT, 0);
//curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);


curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path);
curl_setopt($ch, CURLOPT_COOKIESESSION, true);



curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; Media Center PC 4.0)');
//curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, false);
//curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
//curl_setopt($curl, CURLOPT_VERBOSE, 1);
/*
*/
//echo $string;
echo substr($url,-4).'<br />';
//echo $url;   
$proxy = $proxyip.':'.$proxyoprt;
echo 'proxyip:   '.$proxyip.'<br />';
echo 'proxy:  '.$proxy.'<br />';
$timeout = 5;
$splited = explode(':',$proxy); // Separate IP and port
echo 'splited:   '.$splited.'<br />';
echo $splited[0].'<br />';
echo $splited[1].'<br />';
/*
//if($con = @fsockopen($splited[0], $splited[1], $errorNumber, $errorMessage, $timeout)) 
if($con = @fsockopen($proxyip, $proxyoprt, $errorNumber, $errorMessage, $timeout)) 
{
    echo 'Connection successful, PROXY works!'.'<br />';
} else {
 echo 'Connection FAILED, PROXY FAIL!'.'<br />';
    echo $errorNumber .'<br />';
    echo ' ' . $errorMessage.'<br />';
}
*/
echo "if not run".'<br />';
echo $ss.'<br />';
if ($ss=='.zip' || $ss=='r.gz' ||  $ss=='.pdf') 
 { 
 echo "if runed".'<br />';
 ini_set('max_allowed_packet', '164M');
 ini_set('mysql.wait_timeout', 600);


 ini_set('max_execution_time', '200');

 ini_set('mysql.reconnect', 'On');
  ini_set('mysql.connect_timeout', 300);
ini_set('default_socket_timeout', 300);

 $file = basename($url);    
//$file_extection=extension($url);

echo "file:  ".$file.'<br />';
echo "url:   ".$url.'<br />' ;
$dir= $file;

//    $url  = 'http://www.example.com/a-large-file.zip';
    //$path = $_SERVER['DOCUMENT_ROOT'] . '/downloads/'.$file;
    $path = $_SERVER[$url] . $file;
     if (isset($file_n )&& strlen($file_n)>0) {
    $path=$file_n;
    }
    echo "path:   ".$path.'<br />' ;
        $fp = fopen($path, 'w');
//$fp = fopen(basename($url).'zip', 'w+');
/**
* Ask cURL to write the contents to a file
*/
curl_setopt($ch, CURLOPT_FILE, $fp);
$curl_scraped_page = curl_exec($ch);
$file = 'file.pdf';
$fileName = 'fileName.pdf';
file_put_contents($file, $curl_scraped_page);
file_put_contents($path, $curl_scraped_page);

fclose($fp);

header('Content-type: application/pdf');
header('Content-Disposition: inline; filename="' . $filename . '"');
header('Content-Transfer-Encoding: binary');
header('Content-Length: ' . filesize($file));
header('Accept-Ranges: bytes');

readfile($file);

echo "File DONE".'<br />';
}else {
    echo "curl_scraped_page   ".$curl_scraped_page.'<br />' ;


$curl_scraped_page = curl_exec($ch);
$file = 'file.pdf';
$fileName = 'fileName.pdf';
file_put_contents($file, $curl_scraped_page);
file_put_contents($path, $curl_scraped_page);


header('Content-type: application/pdf');
header('Content-Disposition: inline; filename="' . $filename . '"');
header('Content-Transfer-Encoding: binary');
header('Content-Length: ' . filesize($file));
header('Accept-Ranges: bytes');
}
/*
$file = $url; // URL to the file

$contents = file_get_contents($file); // read the remote file

touch('somelocal.pdf'); // create a local EMPTY copy

file_put_contents('somelocal.pdf', $contents); // put the fetchted data into the newly created file
*/
curl_close($ch);
echo 'curl_close'.'<br />';
echo $curl_scraped_page.'<br />';
}

?>

you can test it at this link: http://apr9ss.tk/curl-proxy-down-work2.php