0

I would like to ask for help, although I don't know if it is in any way appropriate since I'm really a newbie to most of the programming topics.

Let's start from the beginning I'm trying to gather a lot of data from some websites by saving pages as a PDF and then transferring the statistics data into text format for further use and analysis. The websites consists of medical data which cannot be accessed in a bulk in any other way. The data is necessary for my dissertation and being able to perform an analysis on it would be very helpful. My problem is as follows

  1. No option used for printing a page as a PDF works in any of major browsers, however the option "save as a PDF" in Opera browser does - that's the only way to actually get the whole data. Other forms of creating PDF from a given page produce nearly empty page -in example there are objects but there's no data (numbers) describing them. I've tried all sorts of tinkering with the pdf printing to no avail.

  2. I have tried several softwares available on the web that claim to be built specifically for the purpose of creating pdfs from a given URL (Adobe Acrobat included) but none of them give an output that would be even remotely satisfying - all I get is 'loading application' on an otherwise empty page. From the little information I've managed to put together seems that the software can't manage to properly load the web page before actually creating PDF out of it. Please do correct me if I'm wrong.

To the point I could enlist countless tries I have made to find another way but there seems to be no other solution but to automate the PDF creating action in Opera web browser, which brings me to you gentlemen.

Would you please help me to automate the process of the opening a given URL (preferably from a saved URLs list) and then creating a PDF from that web page, all in Opera web browser?

Steps taken so far

  1. I have managed to find out that Chrome snippets used to work in Opera via certain extension but they do not anymore.

  2. I have also found out that there are browser testing programs which could do the job, if you know how to write a certain task (running in a loop?).

  3. I also have managed to install Playwright on my Windows 10 but couldn't even find a way of how to connect it to the Opera web engine in order to take control of the browser behavior.

  4. I've managed to overdose hydroxyzinum couple of times

None of this really brings me much closer to achieving my goal, so please kindly help me if you will.

Thank you very much in advance Best regards Robert

robertmer
  • 11
  • 1

1 Answers1

0

Opera is as you found unique for saving PDF as one continuous page. However it too may save pages as convectional sizes under certain conditions. So I will say it is perhaps simpler to save a PDF in ANY faster browser then stich the pages together.

Here is conventional save as of 6 pages about Hydroxyzine thus nothing special by using Opera here.

enter image description here

However to stich those pages we could use a single program line. As discussed here https://stackoverflow.com/a/76783553/10802527 the key is using cross platform library cpdf, most simply using the instruction:-

cpdf -impose-xy "1 0" Hydroxyzine.pdf -o Output.pdf

Which generates a single page Output.pdf

enter image description here

You asked about programming "Automated Saving", and the reason I said under certain conditions, it is best to save via conventional means, is that here is the automated output from that same page. So note it is only 5 pages and thus something missing which is due to the JavaScripted accordion section not being expanded.

opera --headless --print-to-pdf=output.pdf  Https://www.drugs.com/hydroxyzine.html

enter image description here

K J
  • 8,045
  • 3
  • 14
  • 36