I'm creating an application where users upload a pdf and extracts the text into JSON format. I am able to access the text, but I can't hold the response until the PDF extraction is complete. I'm unfamiliar with Formidable and I may be missing something entirely.
I am using Formidable for uploading and PDFReader for text extraction. The front-end and back-end are on separate servers, and the app is only intended for local use, so that shouldn't be an issue. I'm able to console.log the text perfectly. I would like to work with the text in JSON format in some way. I would like to append the text to the response back to the front-end, but I can't seem to hold it until the response is sent.
const IncomingForm = require("formidable").IncomingForm;
const { PdfReader } = require('pdfreader');
const test = new PdfReader(this,1);
module.exports = function upload(req, res) {
let str = ''
let form = new IncomingForm();
form.parse(req, () => {
console.log('parse')
});
form.on("file", (field, file) => {
test.parseFileItems(file.path, (err, item) => {
if (err){
console.log(err)
}
else if (item){
if (item.text){
console.log(item.text)
str += item.text
}
}
})
});
form.on("end", () => {
console.log("reached end/str: ", str)
});
};
I've attempted a number of different ways of handling the async functions, primarily within form.on('file'). The following attempts at form.on('file') produce the same effect (the text is console.logged correctly but only after form.on('end") is hit:
//Making the callback to form.on('file') async then traditional await
form.on("file", async (field, file) => {
//...
await test.parseFileItems(...)
//...
console.log(str) //After end of PDFReader code, shows blank
//Making cb async, then manually creating promise
form.on("file", async (field, file) => {
//...
let textProm = await new Promise ((res, rej) => //...
I've also attempted to convert the text manually from the Buffer using fs.readFile, but this also produces the same effect; I can only access text after form.end is hit.
A few things I see is that form.on('file') is hit first, then form.parse. It seems maybe I'm attempting to parse the document twice (Formidable and Pdfreader), but this is probably necessary.
Also, after reading through the docs/stackoverflow, I think I'm mixing the built-in middleware with form.parse/form.on/form.end with manual callbacks, but I was unsure of how to stick with just one, and I'm still able to access the text.
Finally, PDFReader accesses text one line at a time, so parseFileItems is run for every line. I've attempted to resolve a Promise.all with the PdfReader instance, but I couldn't get it to work.
Any help would be greatly appreciated!