1

I am attempting to use PHP SoapClient to execute requests to a third-party application. When I create the SoapClient object I get an error about premature end of data for the WSDL. In trying to diagnose the error I found that file_get_contents() for the WSDL URI does not return the entire XML. In fact, it frequently returns different amounts of the WSDL. Here is my test program:

$xml = file_get_contents('https://webservices3.autotask.net/atservices/1.6/atws.wsdl');
echo $xml . "\n";
echo strlen($xml). "\n";

I get around 57k bytes each time (195628 is the correct value), sometimes more and very rarely I get the entire XML. I believe this is a PHP issue because a shell loop to call curl or wget 100 times each for this URI will 100% of the time return the entire file. I am on PHP 5.4.16 which I know is old (2013), but this process was working for about a month and then just stopped entirely.

I've tried changing timeouts, HTTP protocol versions, PHP memory settings, but I can't figure out why file_get_contents would behave this way. Any suggestions are appreciated.

Curl test:

for a in $( seq 1 100 ); do curl -o wsdl.$a https://webservices3.autotask.net/atservices/1.6/atws.wsdl; done

Wget test:

for a in $( seq 1 100 ); do wget -O wsdl.$a https://webservices3.autotask.net/atservices/1.6/atws.wsdl; done

Update 1:

Setting maxlen to some stupid large number does not affect the behavior:

$xml = file_get_contents('https://webservices3.autotask.net/atservices/1.6/atws.wsdl', false, null, 0, 999999);
echo $xml . "\n";
echo strlen($xml). "\n";

Update 2:

$ curl -s -D /dev/stderr -- https://webservices3.autotask.net/atservices/1.6/atws.wsdl > /dev/null
HTTP/1.1 200 OK
Content-Type: text/xml
Last-Modified: Wed, 29 Apr 2020 14:38:25 GMT
Accept-Ranges: bytes
ETag: "39163cd7331ed61:0"
Server: Microsoft-IIS/8.5
X-Powered-By: ASP.NET
Content-Security-Policy: default-src 'self' https: *;script-src 'self' 'unsafe-inline' 'unsafe-eval' https: *;style-src 'self' 'unsafe-inline';img-src 'self' https://walkme.psa.datto.com/Images/ data: https://www.datto.com/img/
Date: Fri, 08 May 2020 15:22:28 GMT
Content-Length: 195628

Here are the response headers as PHP reports them:

$xml = file_get_contents('https://webservices3.autotask.net/atservices/1.6/atws.wsdl');
echo $xml . "\n";
echo strlen($xml). "\n";
echo var_dump($http_response_header);

array(11) {
  [0]=> string(15) "HTTP/1.1 200 OK"
  [1]=> string(22) "Content-Type: text/xml"
  [2]=> string(44) "Last-Modified: Wed, 29 Apr 2020 14:38:25 GMT"
  [3]=> string(20) "Accept-Ranges: bytes"
  [4]=> string(25) "ETag: "39163cd7331ed61:0""
  [5]=> string(25) "Server: Microsoft-IIS/8.5"
  [6]=> string(21) "X-Powered-By: ASP.NET"
  [7]=> string(228) "Content-Security-Policy: default-src 'self' https: *;script-src 'self' 'unsafe-inline' 'unsafe-eval' https: *;style-src 'self' 'unsafe-inline';img-src 'self' https://walkme.psa.datto.com/Images/ data: https://www.datto.com/img/ "
  [8]=> string(35) "Date: Fri, 08 May 2020 15:26:54 GMT"
  [9]=> string(22) "Connection: keep-alive"
  [10]=> string(22) "Content-Length: 195628"
}

Update 3:

Corrupt Content-Length header from PHP with gzip:

$ctx = stream_context_create(array(
    'http' => array(
        'header' => "Accept-Encoding: gzip\r\n"
     )
));
$xml = file_get_contents('https://webservices3.autotask.net/atservices/1.6/atws.wsdl', false, $ctx);
echo var_dump($http_response_header);

array(12) {
  [0]=> string(15) "HTTP/1.1 200 OK"
  [1]=> string(22) "Content-Type: text/xml"
  [2]=> string(44) "Last-Modified: Wed, 29 Apr 2020 14:35:51 GMT"
  [3]=> string(20) "Accept-Ranges: bytes"
  [4]=> string(24) "ETag: "b376e7b331ed61:0""
  [5]=> string(25) "Server: Microsoft-IIS/8.5"
  [6]=> string(21) "X-Powered-By: ASP.NET"
  [7]=> string(228) "Content-Security-Policy: default-src 'self' https: *;script-src 'self' 'unsafe-inline' 'unsafe-eval' https: *;style-src 'self' 'unsafe-inline';img-src 'self' https://walkme.psa.datto.com/Images/ data: https://www.datto.com/img/ "
  [8]=> string(35) "Date: Fri, 08 May 2020 15:44:12 GMT"
  [9]=> string(22) "Connection: keep-alive"
  [10]=> string(22) "ntCoent-Length: 195628"
  [11]=> string(22) "Content-Encoding: gzip"

}

Update 4:

Headers from curl with gzip (note they look correct):

$ curl --compressed -s -D /dev/stderr -- https://webservices3.autotask.net/atservices/1.6/atws.wsdl > /dev/null
HTTP/1.1 200 OK
Content-Type: text/xml
Content-Encoding: gzip
Last-Modified: Wed, 29 Apr 2020 14:35:51 GMT
Accept-Ranges: bytes
ETag: "807d37b331ed61:0"
Vary: Accept-Encoding
Server: Microsoft-IIS/8.5
X-Powered-By: ASP.NET
Content-Security-Policy: default-src 'self' https: *;script-src 'self' 'unsafe-inline' 'unsafe-eval' https: *;style-src 'self' 'unsafe-inline';img-src 'self' https://walkme.psa.datto.com/Images/ data: https://www.datto.com/img/
Date: Fri, 08 May 2020 16:12:13 GMT
Content-Length: 13192

I was able to force SoapClient to not use gzip and this did resolve the issue albeit inefficiently. We still don't have a root cause for PHP mangling the headers.

// Autotask Client options
$auth_opts = array(
    'login'    => $username,
    'password' => $password,
    'trace'    => 1,
    'http'     => array(
        'header' => array(
            'Accept-Encoding' => 'identity' // here be dragons
        )
    )
);

Update 5:

We confirmed this is still reproducible in PHP 7.2. I have opened a bug with the PHP team.

Danack
  • 24,939
  • 16
  • 90
  • 122
Kevin Seymour
  • 766
  • 9
  • 25
  • https://www.php.net/manual/en/function.file-get-contents.php Try to play with `maxlen` param – ExploitFate May 08 '20 at 15:09
  • Thanks for your suggestion, but according to the documentation that is not required and PHP should return the entire file. I have updated my question to show that setting maxlen does not affect the behavior. – Kevin Seymour May 08 '20 at 15:11
  • Kevin, you said "_this process was working for about a month and then just stopped entirely._" . When it happened?.. `Last-Modified: Wed, 29 Apr 2020 14:35:51 GMT`. Maybe it's a proxy? Try to add some get params to url `file_get_contents('https://webservices3.autotask.net/atservices/1.6/atws.wsdl'.'?t='.time());` – ExploitFate May 08 '20 at 22:23
  • Hard to say: the action is initiated by a user so we might make the request several times in a day or none for a few days. The first failure I see is on May 4th. – Kevin Seymour May 11 '20 at 13:20

2 Answers2

1

webservices3.autotask.net has bad header in response

HTTP/1.1 200 OK
Content-Type: text/xml
Accept-Ranges: bytes
Server: Microsoft-IIS/8.5
X-Powered-By: ASP.NET
Cteonnt-Length: 195628
Cache-Control: private
Content-Encoding: gzip
Transfer-Encoding: chunked

Note: Cteonnt-Length: 195628 should be Content-Length: 195628

That why file_get_contents cannot handle request correctly.

So, fix response or set maxlen

UPD: It's jumbled header. This should works https://stackoverflow.com/a/8582042/3849743

ExploitFate
  • 595
  • 2
  • 9
0

This appears to be a bug in PHP. There is an bug report linked at the end of the question.

Kevin Seymour
  • 766
  • 9
  • 25