0

Using scrapy 1.6.0 (twisted 18.9.0, pyopenssl 19.0.0, openssl 1.0.2r, osx 10.14.3). I've ruled out user agent and robots.txt. Seems to be a certificate negotiation issue. There is no web proxy involved.

Destination is https://www.labor.ny.gov/

To reproduce:

04:49:59 dork@Dorks-MacBook:~
0 $ scrapy shell
.
.
.
>>> fetch('https://www.labor.ny.gov')
2019-04-05 16:45:11 [scrapy.core.engine] INFO: Spider opened
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/Users/dork/project/venv/lib/python3.6/site-packages/scrapy/shell.py", line 115, in fetch
    reactor, self._schedule, request, spider)
  File "/Users/dork/project/venv/lib/python3.6/site-packages/twisted/internet/threads.py", line 122, in blockingCallFromThread
    result.raiseException()
  File "/Users/dork/project/venv/lib/python3.6/site-packages/twisted/python/failure.py", line 467, in raiseException
    raise self.value.with_traceback(self.tb)
twisted.web._newclient.ResponseNeverReceived: [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]

Trying to connect and negotiating via OpenSSL directly on the command line seems to fail as well:

0 $ openssl version
OpenSSL 1.0.2r  26 Feb 2019
04:49:59 dork@Dorks-MacBook:~
0 $ openssl s_client -connect www.labor.ny.gov:443
CONNECTED(00000003)
4472571500:error:140790E5:SSL routines:ssl23_write:ssl handshake failure:s23_lib.c:177:
---
no peer certificate available
---
No client certificate CA names sent
---
SSL handshake has read 0 bytes and written 307 bytes
---
New, (NONE), Cipher is (NONE)
Secure Renegotiation IS NOT supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
SSL-Session:
    Protocol  : TLSv1.2
    Cipher    : 0000
    Session-ID:
    Session-ID-ctx:
    Master-Key:
    Key-Arg   : None
    PSK identity: None
    PSK identity hint: None
    SRP username: None
    Start Time: 1554497411
    Timeout   : 300 (sec)
    Verify return code: 0 (ok)
---

However if I force openssl to TLSv1 it seems to work. I just don't know how to force that from scrapy -> twisted -> pyopenssl -> OpenSSL or if it's possible.

04:49:59 dork@Dorks-MacBook:~
0 $ openssl s_client -tls1 -connect www.labor.ny.gov:443
CONNECTED(00000003)
depth=2 C = BE, O = GlobalSign nv-sa, OU = Root CA, CN = GlobalSign Root CA
verify return:1
depth=1 C = BE, O = GlobalSign nv-sa, CN = GlobalSign Organization Validation CA - SHA256 - G2
verify return:1
depth=0 C = US, ST = New York, L = Albany, O = New York State Office for Technology, CN = labor.ny.gov
verify return:1
---
Certificate chain
 0 s:/C=US/ST=New York/L=Albany/O=New York State Office for Technology/CN=labor.ny.gov
   i:/C=BE/O=GlobalSign nv-sa/CN=GlobalSign Organization Validation CA - SHA256 - G2
 1 s:/C=BE/O=GlobalSign nv-sa/CN=GlobalSign Organization Validation CA - SHA256 - G2
   i:/C=BE/O=GlobalSign nv-sa/OU=Root CA/CN=GlobalSign Root CA
---
Server certificate
-----BEGIN CERTIFICATE-----
.
.
.

Postman can't fetch the page either. It seems like anything relying on OpenSSL sort of silently dies.

debugme
  • 1,041
  • 1
  • 10
  • 22

1 Answers1

0

Not a full answer; CW in case anyone can add the scrapy (or related) part.

Man that server is bad! It supports only SSL2 SSL3 and TLS1.0 where the first two are completely broken and the first was completely broken last century. It identifies as IIS/6.0 which dates to Windows Server 2003 -- which was end-of-life long ago.

FWLIW it's not actually version-intolerant, or broken for hello over 256 bytes, as some defective implementations were discovered to be years ago; if I use OpenSSL 1.0.2 to send it ClientHello offering TLS1.2 with ciphers restricted to kRSA, it does negotiate down to TLS1.0 correctly. It only fails for the OpenSSL>=1.0.2 default ClientHello, which uses a significantly larger cipherlist than previous versions because TLS1.2 added a whole bunch of new ciphersuites for the new AEAD format and new PRF scheme. Forcing TLS1.0 has the same effect, because it causes OpenSSL to offer only the smaller list of ciphersuites which were valid in TLS1.0. I vaguely recall an XP-era bug triggered by 'large' cipherlists, and that might be the problem here.

It's not the certificate. The certificate is the only thing they have right.

dave_thompson_085
  • 34,712
  • 6
  • 50
  • 70