1
import asyncio
import pyppeteer
import logging
from pyppeteer import launch

pyppeteer.DEBUG = True
for name in logging.root.manager.loggerDict:
    logging.getLogger(name).disabled = True

async def main():
    browser = await launch(headless = False)
    page = await browser.newPage()
    await page.setJavaScriptEnabled(True)
    response = await page.goto('http://www.africau.edu/images/default/sample.pdf',
                                time = 3000, waitUntil = ['domcontentloaded', 'load', 'networkidle0'])
    content = await response.buffer()
    print(content)
    await browser.close()

asyncio.get_event_loop().run_until_complete(main())

expected output: content of http://www.africau.edu/images/default/sample.pdf

got output: b'df48fcc4-a0b0-4e86-b52e-0ec012ee791e'

Python 3,Linux Ubuntu

Alex
  • 1,047
  • 8
  • 21
  • I’ve been trying at this for hours with no success, definite lack of documentation in this area. I was able to replicate the intended response using python requests and simply parsing the response body as text, which may be a lot easier as a workaround if shit hits the fan. – Keegan Murphy Jan 16 '22 at 20:25
  • Wouldn't it be a lot easier to just use `requests`? – Roland Smith Jan 23 '22 at 13:20
  • 1
    This was answered here: https://stackoverflow.com/questions/49665650/how-to-obtain-a-pdf-embedded-in-page-through-puppeteer – first last Jan 23 '22 at 18:43

2 Answers2

0

I'd suggest using pyppdf it's a Python port of the Puppeteer.

conda install -c defaults -c conda-forge pyppdf
OR
pip install pyppdf

it has a handy function save_pdf

def save_pdf(output_file: str=None, url: str=None, html: str=None,
            args_dict: Union[str, dict]=None,
            args_upd: Union[str, dict]=None,
            goto: str=None, dir_: str=None) -> bytes:

or you could simply just

await page.screenshot({'path': 'ss.png'})
await page.pdf({'path': 'sample.pdf'})
Pixel Paras
  • 169
  • 5
0

I'm aware that you are asking for a solution using pyppeteer, but honestly this can be done way easier with requests.


import requests


def main():
    r = requests.get("http://www.africau.edu/images/default/sample.pdf")
    with open("sample.pdf", "wb") as file:
        file.write(r.content)

if __name__ == "__main__":
    main()

That's all your file will be saved in a file called sample.pdf.

Gealber
  • 463
  • 5
  • 13