0

I use PyQt5.QtWebEngineWidgets to display a pdf. In this pdf I added links over certain words with PyMuPDF. The links are created with this code:

def create_pdf_with_links():
    doc = fitz.open("test.pdf")
    for count, page in enumerate(doc):
        text_instances = page.search_for("Kontrollstelle")
        for instance in text_instances:
            d = {'kind': 2, 'xref': 0, 'from': instance, 'uri': 'data:Hallo Welt', 'id': ''}
            page.insert_link(d)

    doc.save("test.pdf")

Now I want to get the data saved inside the links, when I click on them. Therefore I modified QWebEnginePage and the link scheme based on another question of mine. Unfortunately, I don't get any response, when clicking on the link. When I add a typical url as link content, the corresponding page is loaded. The code for displaying the pdf and handling the link looks like this:

class MyWebEnginePage(QWebEnginePage):
    dataLinkClicked = pyqtSignal(str)

    def acceptNavigationRequest(self, url,  _type, isMainFrame):
        if (_type == QWebEnginePage.NavigationTypeLinkClicked and
            url.scheme() == 'data'):
            # send only the url path
            self.dataLinkClicked.emit(url.path())
            return False
        return super().acceptNavigationRequest(url,  _type, isMainFrame)

class App(QMainWindow):
    def __init__(self):
        super(App, self).__init__()
        self.pdf_path = os.path.abspath("test.pdf")
        webView = QWebEngineView()
        page = MyWebEnginePage(self)
        # connect to the signal
        page.dataLinkClicked.connect(self.handleDataLink)
        webView.setPage(page)
        # use a data-url
        webView.settings().setAttribute(QWebEngineSettings.PluginsEnabled, True)
        webView.settings().setAttribute(QWebEngineSettings.PdfViewerEnabled, True)
        webView.setUrl(QUrl.fromLocalFile(self.pdf_path))
        self.setCentralWidget(webView)

    def handleDataLink(self, text):
        print(text) # should print the link: "Hallo Welt"


if __name__ == '__main__':
    app = QApplication(sys.argv)
    window = App()
    window.setGeometry(800, 100, 1000, 800)
    window.show()
    sys.exit(app.exec_())

How do I get the data from the link?

Mazze
  • 383
  • 3
  • 13
  • The built-in chrome pdf viewer won't propagate link navigation events. You should try the [pdfjs viewer](https://stackoverflow.com/a/48053017/984421) instead - but if you're using pyqt5, *make sure you use the legacy build of pdfjs*. – ekhumoro Oct 15 '22 at 13:49
  • I downloaded the Prebuilt version 2.16.105 and added the code as displayed in the link (changing the path obviously). But when I run the code I get `Your file was not found` in the web window. I use macOs 12.6, PyQt5==5.15.7 and python 3.9.9. – Mazze Oct 15 '22 at 18:59
  • The example code works fine. You obviously did something wrong, but it's impossible to diagnose properly without seeing the actual code you used. Probably you aren't using the correct file-url syntax. It must be ***exactly*** as shown in the example, including the `file:` scheme. – ekhumoro Oct 15 '22 at 19:51
  • Forgot the `file:` (facepalm). Now it works. However, the links I added are not clickable anymore. When I open the pdf with adobe, the links are there. – Mazze Oct 15 '22 at 20:00
  • I tested a pdf with links. For me, the links are clickable, and also show up in `acceptNavigationRequest` as expected. Obviously there is something wrong with the links you added. Do normal http urls work, or is it just data urls? Can you try a pdf with "normal" links (i.e. ones you didn't add yourself)? – ekhumoro Oct 15 '22 at 20:22
  • Using "normal" links works fine and adding http urls also works. The problem only occur with data urls – Mazze Oct 16 '22 at 08:10
  • If I add internal links like this `link = {'kind': 4, 'from': instance, 'name': "data:Artikel_1", 'id': ''}`, the links are shown. Is it possible to catch the navigation request for internal links? – Mazze Oct 16 '22 at 11:13
  • 1
    It's very easy to patch the source code for `pdfjs` so that it supports data-urls. In the files *build/pdf.js* and *build/pdf.worker.js*, search for `switch (url.protocol)`. This reveals that the only supported urls schemes are "http:", "https:", "ftp:", "mailto:" and "tel:". However, I found that simply adding `case "data:":` in both files is enough to fix the issue. You can then add uri links as in your example code which will show up in `acceptNavigationRequest` when clicked. – ekhumoro Oct 16 '22 at 18:46

0 Answers0