0

I am working on a web application for an online library. I want to extract metadata from the PDF's that will be uploaded and for that I am using the nodejs library pdf.js-extract and multer-gridfs-storage for the upload. The problem is that I am receiving a PDF file (req.file) and the function requires a path or link to the PDF file and therefore shows the error

"TypeError [ERR_INVALID_ARG_TYPE]: The "path" argument must be one of type string, Buffer, or URL. Received type object"

I would like to know if there is a way to pass a file as a link, save the file locally temporarily or find another library that fits my needs.

This is my current code.

const PDFExtract  = require('pdf.js-extract').PDFExtract;

app.post('/upload', upload.single('file'), (req, res) => {
  const pdfExtract = new PDFExtract();
  const options = {};

  pdfExtract.extract(req.file, options, (err, data) => {
      if (err){
        res.status(404).send({ message: err });
      }
      res.status(200).send({ message: data });
  });
});

(Edit for clarification) I am using multer with gridFS to upload a file to mongoose.

const multer = require('multer');
const GridFsStorage = require('multer-gridfs-storage');

// Create storage engine
const storage = new GridFsStorage({
  url: mongoURI,
  file: (req, file) => {
    return new Promise((resolve, reject) => {
      crypto.randomBytes(16, (err, buf) => {
        if (err) {
          return reject(err);
        }
        const filename = buf.toString('hex') + path.extname(file.originalname);
        const fileInfo = {
          filename: filename,
          bucketName: 'uploads'
        };
        resolve(fileInfo);
      });
    });
  }
});
const upload = multer({ storage });

Solution inspired by Oliver Nybo

app.post('/upload', upload.single('file'), (req, res) => {
  const pdfExtract = new PDFExtract();
  const options = {};

  var readableStream = gfs.createReadStream({ filename : req.file.filename });
  var buff;

  var bufferArray = [];
  readableStream.on('data',function(chunk){  
      bufferArray.push(chunk);
  });
  readableStream.on('end',function(){
      var buffer = Buffer.concat(bufferArray);
      buff=buffer;
      pdfExtract.extractBuffer(buff, options, (err, data) => {
        if (err) {
          res.status(404).send({ message: err });
        }
        res.status(200).send({ message: data });
      });
  })
});
  • can't you just buffer the file to the function? Check https://stackoverflow.com/questions/19705972/buffer-entire-file-in-memory-with-nodejs – Carlos Alves Jorge May 07 '19 at 09:11
  • I'm looking into it but it seems like readFile and readFileSync also take paths, strings or buffers as a parameter. I am getting the same error using that. – Luis de la Cal May 07 '19 at 09:20

1 Answers1

1

According to multer's api documentation, you can use req.file.path to get the full path of the uploaded file.

const PDFExtract  = require('pdf.js-extract').PDFExtract;

app.post('/upload', upload.single('file'), (req, res) => {
  const pdfExtract = new PDFExtract();
  const options = {};

  pdfExtract.extract(req.file.path, options, (err, data) => {
      if (err){
        res.status(404).send({ message: err });
      }
      res.status(200).send({ message: data });
  });
});

Edit: I just read the multer options and there is an option called preservePath.

preservePath - Keep the full path of files instead of just the base name

Edit 2: I think you need to extract the file from the database with gridfs-stream, then convert it into a buffer (like in this thread), and then use PDFExtract's extractBuffer function.

Oliver Nybo
  • 560
  • 1
  • 6
  • 24
  • Oddly enough, req.file.path is undefined. Doing a console.log of req.file gives `{ fieldname: 'file', originalname: 'Alice_in_Wonderland.pdf', encoding: '7bit', mimetype: 'application/pdf', id: 5cd1528c0614d139ec8f5774, filename: '3c90b9cfa1925acf4d75d6d629e5909c.pdf', metadata: null, bucketName: 'uploads', chunkSize: 261120, size: 3083601, md5: '22f3af3730bc9820c1bf6d90b3271a47', uploadDate: 2019-05-07T09:40:32.796Z, contentType: 'application/pdf' }` – Luis de la Cal May 07 '19 at 09:44
  • That is really weird... Can you show us how you initialize multer? And are you using the latest version of multer? – Oliver Nybo May 07 '19 at 09:51
  • I just read the [multer options](https://www.npmjs.com/package/multer#multeropts) and there is an option called `preservePath`, try setting it to true. @LuisdelaCal – Oliver Nybo May 07 '19 at 09:54
  • I edited my question, I am also using gridfs with multer. The library multer-gridfs-storage – Luis de la Cal May 07 '19 at 09:58
  • @LuisdelaCal i'm not completely sure, but can't you then add a `path` property to `fileInfo` with a value of `file.path`? – Oliver Nybo May 07 '19 at 10:03
  • I have tried it but the response stays the same, even with preservePath: true – Luis de la Cal May 07 '19 at 10:25
  • @LuisdelaCal The last thing I can think of is to generate the path yourself by combining the upload destination and the `req.file.filename`. – Oliver Nybo May 07 '19 at 10:49
  • The problem with that is that the upload destination is a mongoDB database – Luis de la Cal May 07 '19 at 11:06
  • It took a while, but I managed to solve it using your indications in Edit2, thank you – Luis de la Cal May 07 '19 at 22:08
  • @LuisdelaCal I'm happy to hear you figured it out – Oliver Nybo May 08 '19 at 08:31