1

I'm trying to fetch the text contents of the first page of a PDF file using NPM node module 'PDF-lib'.

However when I fetch the contents and print the results, I instead get an array of data that looks something like below;

Could you please help me spot the problem?

Thanks in advance!

The results I get after printing look like this. What I want to fetch are the actual text contents of the PDF page.

PDFPage {

fontSize: 24,

fontColor: { type: 'RGB', red: 0, green: 0, blue: 0 },

lineHeight: 24,

x: 0,

y: 0,

node: PDFPageLeaf {

dict: Map(8) {

[PDFName] => [PDFName],

[PDFName] => [PDFRef],

[PDFName] => [PDFDict],

[PDFName] => [PDFArray],

[PDFName] => [PDFRef],

[PDFName] => [PDFDict],

[PDFName] => [PDFName],

[PDFName] => [PDFNumber]

},

...

...

...

The Code:


const { resolve } = require('path');
const { PDFDocument } = require('pdf-lib'); // Library for reading PDF file
const fs = require('fs');

async function readDataset() {

    try { 

        // Get PDF Page
        const content = await PDFDocument.load(fs.readFileSync(resolve(`./app/assets/pdfs/np.pdf`)));

        // Get page contents
        const contentPages = content.getPages();

        let pageContent = contentPages[0];

        // Return data found on first page
        return pageContent;
    }

    catch (err) { 
        return err;
    }
    
}

// Read data from dataset
let dataset = await readDataset();

Vakindu
  • 529
  • 1
  • 5
  • 17

1 Answers1

0

Not generally possible at present (2021 ) with this library see current Limitations this info is also on the npm page at https://www.npmjs.com/package/pdf-lib#limitations

#1

pdf-lib can extract the content of text fields (see PDFTextField.getText), but it cannot extract plain text on a page outside of a form field. This is a difficult feature to implement, but it is within the scope of this library and may be added to pdf-lib in the future. See #93, #137, #177, #329, and #380.

For future visitors always check the link above for current status.

K J
  • 8,045
  • 3
  • 14
  • 36