2

Using Python3, and Poppler, I can load files with new_from_file without problem, but new_from_data is problematic. Here is the code which is obviously a simple test, because it does not make sense to read from file and then use new_from_data, since new_from_file works perfectly, but I could not post here the full code generating the pdf file.

from gi.repository import Poppler, Gtk

def draw(widget, cr):
        # set background.
        cr.set_source_rgb(0.7, 0.6, 0.5)
        cr.paint()

        # set page background
        cr.set_source_rgb(1, 1, 1)
        cr.rectangle(0,0,800,400)

        cr.fill()
        page.render(cr)

filepath = "d:/Mes Documents/A5.pdf" 
f11 = open(filepath, "r", encoding = "cp850")
data1 = f11.read()
f11.close()

document = Poppler.Document.new_from_data(data1, len(data1),  None)
page = document.get_page(0)
print (document.get_n_pages())


window = Gtk.Window(title="Hello World")
window.connect("delete-event", Gtk.main_quit)
window.connect("draw", draw)
window.set_app_paintable(True)

window.show_all()
Gtk.main()

Four different situations may happen :

  • With a very simple pdf (the "Hello world" example in Pdf Reference 13), it works.
  • With a normal file, there may be no error, but get_n_pages returns 0, and get_page(0) returns None
  • Or I may get an error : GLib.Error: poppler-quark: PDF document is damaged (4)
  • Or the program crashs

I wonder if the problem may be with the encoding parameter, but I tried everything I thought of without result. I tried with "rb" and then converting bytes array to string with :

data1 = "".join(map(data1))

No result.

Search on Google never returned a working example

Dysmas
  • 152
  • 11

2 Answers2

6

I ran into the same problem, solved it using Gio.MemoryInputStream. Not really elegant but it works...

from gi.repository import Poppler, Gtk, Gio

def draw(widget, cr):
        # set background.
        cr.set_source_rgb(0.7, 0.6, 0.5)
        cr.paint()

        # set page background
        cr.set_source_rgb(1, 1, 1)
        cr.rectangle(0,0,800,400)

        cr.fill()
        page.render(cr)

filepath = "d:/Mes Documents/A5.pdf" 
with open(filepath, "rb") as f11:
    input_stream = Gio.MemoryInputStream.new_from_data(f11.read())
    # Take care that you need to call .close() on the Gio.MemoryInputStream once you're done with your pdf document.

document = Poppler.Document.new_from_stream(input_stream, -1, None, None)
page = document.get_page(0)
print (document.get_n_pages())


window = Gtk.Window(title="Hello World")
window.connect("delete-event", Gtk.main_quit)
window.connect("draw", draw)
window.set_app_paintable(True)

window.show_all()
Gtk.main()
Trap
  • 218
  • 3
  • 8
  • Thanks, @Trap, it works fine but for one point : calling .close() on the Gio.MemoryInputStream does not free the memory, neither does del. Since I use this code for viewing Pdf pages, each page viewed increases memory use, and it may end in GigaBytes. – Dysmas Aug 19 '17 at 16:09
  • That's weird, the documentation is pretty explicit on the purpose of close(). Actually, if I understand correctly the docs you don't even need to call it since the resources of the stream should be released once the stream object is garbage collected. http://lazka.github.io/pgi-docs/#Gio-2.0/classes/InputStream.html#Gio.InputStream.close – Trap Aug 21 '17 at 16:54
  • 1
    I posted my test code in a new Question : https://stackoverflow.com/questions/45838863/gio-memoryinputstream-does-not-free-memory-when-closed/45869003#45869003. The answer explains that it is a bug and recommends using new_from_bytes instead of new_from_data which suffers from a bug. Tested, it works. – Dysmas Aug 24 '17 at 19:07
2

It works if you read the file as binary "rb" and without encoding. I also needed to remove the data length to fix TypeError: Poppler.Document.new_from_data() takes exactly 2 arguments (3 given) (could be different in poppler versions).

#!/bin/python3
from gi.repository import Poppler, Gtk

def draw(widget, cr):
        # set background.
        cr.set_source_rgb(0.7, 0.6, 0.5)
        cr.paint()

        # set page background
        cr.set_source_rgb(1, 1, 1)
        cr.rectangle(0,0,800,400)

        cr.fill()
        page.render(cr)

filepath = "/home/da/test.pdf"
f11 = open(filepath, "rb")
data1 = f11.read()
f11.close()

document = Poppler.Document.new_from_data(data1, None)
page = document.get_page(0)
print (document.get_n_pages())


window = Gtk.Window(title="Hello World")
window.connect("delete-event", Gtk.main_quit)
window.connect("draw", draw)
window.set_app_paintable(True)

window.show_all()
Gtk.main()

Tested with poppler 0.84.0 and Python 3.8.5 on Fedora Linux.

dreua
  • 527
  • 2
  • 15
  • Surely it depends from poppler version. In the only version available for Windows, which is old, I receive the error message : TypeError: Must be string, not bytes. The solution with Gio.MemoryInputStream works fine. – Dysmas Aug 21 '20 at 11:36
  • 1
    Versions added. – dreua Aug 22 '20 at 12:57