0

Edit: I added the verbose output of both curl and pycurl below. Apart from the HTTP 200 vs. 500 response I still don't see any difference.

I would like to download a file in a Python script. I wrote curl statements which do the job in bash:

t=$(echo `date +%s`' * 1000 - 3600000' | bc)

token=`curl 'https://www.intomics.com/inbio/api/login_guest?ref=&_='"$t"\
    | python -c "import json,sys;obj=json.load(sys.stdin);print(obj['token']);"`

curl 'https://www.intomics.com/inbio/map/api/get_data?file=InBio_Map_core_2016_09_12.tar.gz'\
    -H 'Cookie: access_token='"$token"  -o result.tar.gz

However my pycurl implementation of the same results a HTTP 500 error:

import pycurl
import time
import json

t = int(time.time() * 1000) - 3600000

url   = 'https://www.intomics.com/inbio/map/api/'\
        'get_data?file=InBio_Map_core_2016_09_12.tar.gz'
login = 'https://www.intomics.com/inbio/api/login_guest?ref=&_=%u' % t

fp_login = open('imweb.login.tmp', 'wb')
fp_imweb = open('imweb.tmp.tar.gz', 'wb')

c0 = pycurl.Curl()
c0.setopt(pycurl.URL, login)
c0.setopt(pycurl.WRITEFUNCTION, fp_login.write)

c0.perform()

fp_login.close()

with open('imweb.login.tmp', 'r') as fp:
    token = json.loads(fp.read())['token']

print('Token: %s' % token)

hdrs = ['Cookie: access-token=%s' % token]

c1 = pycurl.Curl()
c1.setopt(pycurl.URL, url)
c1.setopt(pycurl.WRITEFUNCTION, fp_imweb.write)
c1.setopt(pycurl.HTTPHEADER, [h.encode('ascii') for h in hdrs])
c1.setopt(pycurl.ENCODING, 'gzip, deflate')

c1.perform()

fp_imweb.close()

I don't see difference between the two, do you?

Edit: verbose outputs

curl called from bash:

[ bash ] * TCP_NODELAY set
[ bash ] * Connected to www.intomics.com (77.72.50.69) port 443 (#0)
[ bash ] * ALPN, offering http/1.1
[ bash ] * Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
[ bash ] * successfully set certificate verify locations:
[ bash ] *   CAfile: /etc/ssl/certs/ca-certificates.crt
[ bash ]   CApath: none
[ bash ] * TLSv1.2 (OUT), TLS header, Certificate Status (22):
[ bash ] } [5 bytes data]
[ bash ] * TLSv1.2 (OUT), TLS handshake, Client hello (1):
[ bash ] } [512 bytes data]
[ bash ] * TLSv1.2 (IN), TLS handshake, Server hello (2):
[ bash ] { [98 bytes data]
[ bash ] * TLSv1.2 (IN), TLS handshake, Certificate (11):
[ bash ] { [2279 bytes data]
[ bash ] * TLSv1.2 (IN), TLS handshake, Server key exchange (12):
[ bash ] { [333 bytes data]
[ bash ] * TLSv1.2 (IN), TLS handshake, Server finished (14):
[ bash ] { [4 bytes data]
[ bash ] * TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
[ bash ] } [70 bytes data]
[ bash ] * TLSv1.2 (OUT), TLS change cipher, Client hello (1):
[ bash ] } [1 bytes data]
[ bash ] * TLSv1.2 (OUT), TLS handshake, Finished (20):
[ bash ] } [16 bytes data]
[ bash ] * TLSv1.2 (IN), TLS change cipher, Client hello (1):
[ bash ] { [1 bytes data]
[ bash ] * TLSv1.2 (IN), TLS handshake, Finished (20):
[ bash ] { [16 bytes data]
[ bash ] * SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384
[ bash ] * ALPN, server did not agree to a protocol
[ bash ] * Server certificate:
[ bash ] *  subject: OU=GT95967783; OU=See www.rapidssl.com/resources/cps (c)15; OU=Domain Control Validated - RapidSSL(R); CN=*.intomics.com
[ bash ] *  start date: Mar 24 02:21:50 2015 GMT
[ bash ] *  expire date: Mar 26 04:01:35 2018 GMT
[ bash ] *  subjectAltName: host "www.intomics.com" matched cert's "*.intomics.com"
[ bash ] *  issuer: C=US; O=GeoTrust Inc.; CN=RapidSSL SHA256 CA - G3
[ bash ] *  SSL certificate verify ok.
[ bash ] } [5 bytes data]
[ bash ] > GET /inbio/map/api/get_data?file=InBio_Map_core_2016_09_12.tar.gz HTTP/1.1
[ bash ] > Host: www.intomics.com
[ bash ] > User-Agent: curl/7.53.1
[ bash ] > Accept: */*
[ bash ] > Cookie: access_token=eyJhbGciOiJIUzUxMiIsInR5cCI6IkpXVCJ9.eyJ2IjoyLCJsb2dpbiI6bnVsbCwiZXhwIjoxNDkxODIzMjAxLCJpYXQiOjE0OTE4MjMwODEsInNpZCI6MTA4MDcsInNleHAiOjE0OTE4MjMzODEwNzcsImFtcyI6IiJ9.jMh6NUvy-Vd4YGiNSzxrbcE6c8VaO20ryvHOwx-C385GkSmaNWFZz4TSsQm2n-gf6jQvBYWC289HibRHx_tktg
[ bash ] > 
[ bash ] { [5 bytes data]
[ bash ] < HTTP/1.1 200 OK
[ bash ] < Date: Mon, 10 Apr 2017 11:18:02 GMT
[ bash ] < Server: Apache/2.4.18 (Ubuntu)
[ bash ] < Content-Disposition: attachment; filename="InBio_Map_core_2016_09_12.tar.gz"
[ bash ] < Content-Type: application/x-compressed
[ bash ] < X-Frame-Options: sameorigin
[ bash ] < Transfer-Encoding: chunked
[ bash ] < 
[ bash ] { [5 bytes data]

The same for pycurl:

[ pycurl ] Token: eyJhbGciOiJIUzUxMiIsInR5cCI6IkpXVCJ9.eyJ2IjoyLCJsb2dpbiI6bnVsbCwiZXhwIjoxNDkxODIyNzM0LCJpYXQiOjE0OTE4MjI2MTQsInNpZCI6MTA4MDUsInNleHAiOjE0OTE4MjI5MTQ2NTIsImFtcyI6IiJ9.BRH1cPnzX-iw-l01SZqxlROYsu2TiXmIZwafsmnUp30IouwM8qzXKy_Ik0UYGqsUOrhhkYVMYKRgWEtipyhEBw
[ pycurl ]   Trying 77.72.50.69...
[ pycurl ] TCP_NODELAY set
[ pycurl ] Connected to www.intomics.com (77.72.50.69) port 443 (#0)
[ pycurl ] ALPN, offering http/1.1
[ pycurl ] Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
[ pycurl ] successfully set certificate verify locations:
[ pycurl ]   CAfile: /etc/ssl/certs/ca-certificates.crt\n  CApath: none
[ pycurl ] TLSv1.2 (OUT), TLS header, Certificate Status (22):
[ pycurl ] TLSv1.2 (OUT), TLS handshake, Client hello (1):
[ pycurl ] TLSv1.2 (IN), TLS handshake, Server hello (2):
[ pycurl ] TLSv1.2 (IN), TLS handshake, Certificate (11):
[ pycurl ] TLSv1.2 (IN), TLS handshake, Server key exchange (12):
[ pycurl ] TLSv1.2 (IN), TLS handshake, Server finished (14):
[ pycurl ] TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
[ pycurl ] TLSv1.2 (OUT), TLS change cipher, Client hello (1):
[ pycurl ] TLSv1.2 (OUT), TLS handshake, Finished (20):
[ pycurl ] TLSv1.2 (IN), TLS change cipher, Client hello (1):
[ pycurl ] TLSv1.2 (IN), TLS handshake, Finished (20):
[ pycurl ] SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384
[ pycurl ] ALPN, server did not agree to a protocol
[ pycurl ] Server certificate:
[ pycurl ] subject: OU=GT95967783; OU=See www.rapidssl.com/resources/cps (c)15; OU=Domain Control Validated - RapidSSL(R); CN=*.intomics.com
[ pycurl ] start date: Mar 24 02:21:50 2015 GMT
[ pycurl ] expire date: Mar 26 04:01:35 2018 GMT
[ pycurl ] subjectAltName: host "www.intomics.com" matched cert\'s "*.intomics.com"
[ pycurl ] issuer: C=US; O=GeoTrust Inc.; CN=RapidSSL SHA256 CA - G3
[ pycurl ] SSL certificate verify ok.
[ pycurl ] GET /inbio/map/api/get_data?file=InBio_Map_core_2016_09_12.tar.gz HTTP/1.1\r\nHost: www.intomics.com\r\nUser-Agent: PycURL/7.43.0 libcurl/7.53.1 OpenSSL/1.0.2k zlib/1.2.11 libpsl/0.17.0 (+libicu/58.2) libssh2/1.8.0\r\nAccept: */*\r\nCookie: access-token=eyJhbGciOiJIUzUxMiIsInR5cCI6IkpXVCJ9.eyJ2IjoyLCJsb2dpbiI6bnVsbCwiZXhwIjoxNDkxODIyNzM0LCJpYXQiOjE0OTE4MjI2MTQsInNpZCI6MTA4MDUsInNleHAiOjE0OTE4MjI5MTQ2NTIsImFtcyI6IiJ9.BRH1cPnzX-iw-l01SZqxlROYsu2TiXmIZwafsmnUp30IouwM8qzXKy_Ik0UYGqsUOrhhkYVMYKRgWEtipyhEBw\r\n
[ pycurl ] HTTP/1.1 500 Internal Server Error
[ pycurl ] Date: Mon, 10 Apr 2017 11:10:14 GMT
[ pycurl ] Server: Apache/2.4.18 (Ubuntu)
[ pycurl ] Content-Length: 606
[ pycurl ] Content-Type: text/html; charset=iso-8859-1
[ pycurl ] X-Frame-Options: sameorigin
[ pycurl ] Connection: close
[ pycurl ] <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">\n<html><head>\n<title>500 Internal Server Error</title>\n</head><body>\n<h1>Internal Server Error</h1>\n<p>The server encountered an internal error or\nmisconfiguration and was unable to complete\nyour request.</p>\n<p>Please contact the server administrator at \n lwi@intomics.com to inform them of the time this error occurred,\n and the actions you performed just before this error.</p>\n<p>More information about this error may be available\nin the server error log.</p>\n<hr>\n<address>Apache/2.4.18 (Ubuntu) Server at inbiomapapi Port 80</address>\n</body></html>\n
[ pycurl ] Closing connection 0
[ pycurl ] TLSv1.2 (OUT), TLS alert, Client hello (1)

Edit 2: example with requests:

import requests
import time
import json

t = int(time.time() * 1000) - 3600000

url   = 'https://www.intomics.com/inbio/map/api/'\
        'get_data?file=InBio_Map_core_2016_09_12.tar.gz'
login = 'https://www.intomics.com/inbio/api/login_guest?ref=&_=%u' % t

r0 = requests.get(login)
token = json.loads(r0.text)['token']
hdrs = {'Cookie': 'access-token=%s' % token}

with open('imweb.tmp.tar.gz', 'wb') as fp:

    r1 = requests.get(url, headers = hdrs, stream = True)

    for block in r1.iter_content(4096):

        fp.write(block)

Oddly this too results HTTP 500 error. Hence it is not a bug in pycurl, but then there must be some difference between the bash or firefox version and the pycurl or requests examples.

deeenes
  • 4,148
  • 5
  • 43
  • 59
  • Other headers -- User-Agent, for example? – pbuck Apr 10 '17 at 15:25
  • @pbuck: first I got the curl statements from Firefox's inspector, then I removed the unnecessary headers testing if it still works. In Python too I tested with exactly the same headers as in Firefox (`User-Agent: Mozilla...` etc), and the result is the same – deeenes Apr 11 '17 at 07:52
  • work with python requests? (Wonder if bug in pycurl) – pbuck Apr 11 '17 at 13:03
  • thanks @pbuck, I tried `requests` and it gives me `HTTP 500` error. but then the question again, what makes the difference? – deeenes Apr 12 '17 at 09:43

0 Answers0