Is it possible to get pdf page using pyppeteer?

Question

import asyncio
import pyppeteer
import logging
from pyppeteer import launch

pyppeteer.DEBUG = True
for name in logging.root.manager.loggerDict:
    logging.getLogger(name).disabled = True

async def main():
    browser = await launch(headless = False)
    page = await browser.newPage()
    await page.setJavaScriptEnabled(True)
    response = await page.goto('http://www.africau.edu/images/default/sample.pdf',
                                time = 3000, waitUntil = ['domcontentloaded', 'load', 'networkidle0'])
    content = await response.buffer()
    print(content)
    await browser.close()

asyncio.get_event_loop().run_until_complete(main())

expected output: content of http://www.africau.edu/images/default/sample.pdf

got output: b'df48fcc4-a0b0-4e86-b52e-0ec012ee791e'

Python 3,Linux Ubuntu

I’ve been trying at this for hours with no success, definite lack of documentation in this area. I was able to replicate the intended response using python requests and simply parsing the response body as text, which may be a lot easier as a workaround if shit hits the fan. — Keegan Murphy, Jan 16 '22 at 20:25
This was answered here: https://stackoverflow.com/questions/49665650/how-to-obtain-a-pdf-embedded-in-page-through-puppeteer — first last, Jan 23 '22 at 18:43

score 0 · Answer 1 · answered Jan 18 '22 at 10:56

I'd suggest using pyppdf it's a Python port of the Puppeteer.

conda install -c defaults -c conda-forge pyppdf
OR
pip install pyppdf

it has a handy function save_pdf

def save_pdf(output_file: str=None, url: str=None, html: str=None,
            args_dict: Union[str, dict]=None,
            args_upd: Union[str, dict]=None,
            goto: str=None, dir_: str=None) -> bytes:

or you could simply just

await page.screenshot({'path': 'ss.png'})
await page.pdf({'path': 'sample.pdf'})

score 0 · Answer 2 · answered Jan 23 '22 at 12:08

I'm aware that you are asking for a solution using pyppeteer, but honestly this can be done way easier with requests.


import requests


def main():
    r = requests.get("http://www.africau.edu/images/default/sample.pdf")
    with open("sample.pdf", "wb") as file:
        file.write(r.content)

if __name__ == "__main__":
    main()

That's all your file will be saved in a file called sample.pdf.

Is it possible to get pdf page using pyppeteer?

2 Answers2