I'm trying to fetch the text contents of the first page of a PDF file using NPM node module 'PDF-lib'.
However when I fetch the contents and print the results, I instead get an array of data that looks something like below;
Could you please help me spot the problem?
Thanks in advance!
The results I get after printing look like this. What I want to fetch are the actual text contents of the PDF page.
PDFPage {
fontSize: 24,
fontColor: { type: 'RGB', red: 0, green: 0, blue: 0 },
lineHeight: 24,
x: 0,
y: 0,
node: PDFPageLeaf {
dict: Map(8) {
[PDFName] => [PDFName],
[PDFName] => [PDFRef],
[PDFName] => [PDFDict],
[PDFName] => [PDFArray],
[PDFName] => [PDFRef],
[PDFName] => [PDFDict],
[PDFName] => [PDFName],
[PDFName] => [PDFNumber]
},
...
...
...
The Code:
const { resolve } = require('path');
const { PDFDocument } = require('pdf-lib'); // Library for reading PDF file
const fs = require('fs');
async function readDataset() {
try {
// Get PDF Page
const content = await PDFDocument.load(fs.readFileSync(resolve(`./app/assets/pdfs/np.pdf`)));
// Get page contents
const contentPages = content.getPages();
let pageContent = contentPages[0];
// Return data found on first page
return pageContent;
}
catch (err) {
return err;
}
}
// Read data from dataset
let dataset = await readDataset();