1

I need to download a bunch of pdf files from the web. I usually use the urllib3 library, but it is a corporate website with authentication. I can download a normal html web using the following:

url = 'https://corpweb.example/index.html'
h = win32com.client.Dispatch('WinHTTP.WinHTTPRequest.5.1')
h.SetAutoLogonPolicy(0)
h.Open('GET', url, False)
h.Send()
result = h.responseText

But this solution doesn't works with a PDF.

url = "https://corpweb.example/file.pdf"
h = win32com.client.Dispatch('WinHTTP.WinHTTPRequest.5.1')
h.SetAutoLogonPolicy(0)
h.Open('GET', url, False)
h.Send()
with open(filename, 'wb') as f:
    f.write(h.responseText)

I get an error:

TypeError: a bytes-like object is required, not 'str'

What can I do?

user3840170
  • 26,597
  • 4
  • 30
  • 62

3 Answers3

1

As Microsoft’s documentation of WinHttpRequest explains, responseText contains the response body as Unicode text. To obtain the response body as raw bytes, use responseBody instead.

Also consider using responseStream instead of either, to avoid keeping the entire file in memory at once.

user3840170
  • 26,597
  • 4
  • 30
  • 62
-1

Try using urllib.request.urlretrieve(url, filepath)?

import urllib.request as url
url="https://corpweb/file.pdf"
url.urlretrieve(url, "file.pdf")

It may be the best solution. Or you can use requests:

import requests
import os
url="https://corpweb/file.pdf"
resp = requests.get(url) # Get the response
os.system("type nul > file.pdf") # Create a new file
f = open("file.pdf", "wb") # Open file
f.write(resp.content) # Write
f.close() # Close file
electromeow
  • 211
  • 2
  • 8
-2

Open file Mode :

with open(fname, 'rb') as f:
    ...

This means that all data read from the file is returned as bytes objects, not str. You cannot then use a string in a containment test:

if 'some-pattern' in tmp:
    continue
user3840170
  • 26,597
  • 4
  • 30
  • 62