0

libpoppler-qt5.so is used. I use following code to extract text from PDF document:

QString pdf2txt(const char *buf, size_t len)
{

    Poppler::Document* document = Poppler::Document::loadFromData(QByteArray(buf, len));
    unique_ptr<Poppler::Document> doc_del(document);
    if (!document || document->isLocked()) throw runtime_error("pdf2txt document is locked or unavailable");

    const int pages = document->numPages();
    QString dst;
    for (int i = 0; i < pages; ++i)
    {
        Poppler::Page* page = document->page(i);
        if (!page) throw runtime_error("bad pdf document");
        unique_ptr<Poppler::Page> page_del(page);
        dst += page->text(QRect());
    }

    return dst;
}

But it segfaults when it`s used for several threads. For one thread it seems to be OK. Is this code thread safe? Are there any other thread-safe libs to extract text from PDF document? thank you

Alex
  • 1,047
  • 8
  • 21
  • I`ve done with this library. It has segfaults and low quality text result. I created my own library to extract text https://github.com/uvoteam/pdf_extract – Alex Jul 26 '20 at 08:27

1 Answers1

0

According to Bug 50992 and the release notes, poppler is thread-safe by now.

However, I noticed the following line:

dst += page->text(QRect());

accesses the page pointer, although you created a unique_ptr page_del from this pointer before, which is not accessed. Could the unique_ptr be cleaned up before you access the pointer?

penguineer
  • 96
  • 5