0

OS: ubuntu 20.04 Nodejs: 14.18.3 npm: 6.14.15 pdfMake: 0.1.72 (for create pdf files from a file or db queries)

Currently, I try make a pdf from a file (it could be a db query) and I want to send it to the client through stream. Send chunks of data to the users to avoid buffering the server's ram. I've already send it to the user, but it takes to much time to download a 300kb pdf, almost 15 secs. I've seem that every chunk received in the client is 16kb, that is the unchangeable (I think) sizes of the res object.

I use a readable stream coming from pdfMake library and try to pipe into the res object (express).

Here is the creation of the file (pdfMake library)

let bigfile = fs.readFileSync('./bigfile.txt', 'utf8');

function docDefinition() {
    let content = [{
        text: 'This is a header',
        alignment: 'center',
        fontSize: 25,
        margin: [0,0,0,0]
    }]
    
    content.push({ text: bigfile })

    let docDefinition = {
        content: content
    }

    return docDefinition
}

Here, I generate the row pdf itself "as a readable stream" and pass it to my express route in a callback

const generatePdf = (docDefinition, callback) => {

    try {
        const fontDescriptors = {
            Roboto: {
                normal: path.join(__dirname, '/fonts/Roboto-Regular.ttf'),
                bold: path.join(__dirname, '/fonts/Roboto-Medium.ttf'),
                italics: path.join(__dirname, '/fonts/Roboto-Italic.ttf'),
                bolditalics: path.join(__dirname, '/fonts/Roboto-MediumItalic.ttf'),
            },
        };
        const printer = new pdfMakePrinter(fontDescriptors);

        const doc = printer.createPdfKitDocument(docDefinition);

        /* something to add, **const = doc** has its own event for streaming (I guess)
        doc.on('data', chunk => {
           // send current chunk to client somehow
        })

        doc.on('end', chunk => {
           // finished the streaming
        })        
        */

        callback(doc)
    } catch (error) {
        throw error;        
    }
}

My main server.js, were I try to send the pdf to the user in a chuncked transafer way

app.get('/file', (req, res) => {
    generatePdf(
        docDefinition, 
        readable => {
            res.set({
                "Content-Type": "application/pdf",
                "Transfer-Encoding": "chunked",
                "Connection": "keep-alive"
            });

            console.log('res.HighWaterMark', res.writableHighWaterMark);
            console.log('readable highWaterMark', readable._readableState.highWaterMark);

            readable.pipe(res)
            readable.end()
        })
})

I just try to streaming a video in my browser and It was very easy cos the fs internal module. But here I using a external library (pdfmake) to make my readable stream. I'm very beginner with streams in node. I very appreciate any suggestion o help with that particular problem. :S

The source code could be fund here: https://github.com/biagiola/streamPDFtoClient

1 Answers1

0

You don't need a Readable because Express' Response is based node's http.serverResponse which is a stream.

Edit: Just pass the respose to generatePdf and use doc.pipe(callback) and doc.end() there. I change callback to res in my sample code:

const pdfMakePrinter = require('pdfmake');

const generatePdf = (docDefinition, res) => {
  // ... setup code here.
  const doc = printer.createPdfKitDocument(docDefinition);
  doc.pipe(res);
  doc.end();
}

server.js:

app.get('/file', (req, res) => {
    generatePdf(
        docDefinition, 
        res);
});

Further Edits:

It could be the speed of the library and you may have to live with it. When I use the following code to time the generation process (removing the creating of raw document) I get about 7.25 seconds on my system.

app.get('/file', (req, res, next) => {

    res.attachment('myFancyPdf.pdf')
    const start = Date.now();
    generatePdfBase64(
        docDefinition,
        res);
    next(console.log(Date.now() - start))
});

My development system is an Intel i7-10700T CPU at with 32GB of RAM.

kevintechie
  • 1,441
  • 1
  • 13
  • 15
  • The file that I have is really, the equivalent to the db query that is push into the content of the pdf maker. The result of creating the pdf with that library is the **const doc** res.write(doc) I will try to see all the properties of doc elements, and see if there is a pdf raw data only y try it in that way that you are saying – David Biagiola Feb 14 '22 at 18:56
  • Another thing that I thing is about this stream using event, doc.on('data', chunk => { // send chunk to client somehow }); doc.on('end', () => { // end the streaming }); doc.end(); that is similar to use doc.pipe() doc.end() But the second way it was taking to much time. – David Biagiola Feb 14 '22 at 19:04
  • It doesn't matter if the source is a db call or text file. Also, the way you're handling createPdfKitDocument(), it is not a stream. The above code doesn't really match your source code link, but even there, you're done with the stream at the end of the generatePdfBase64() call. (You call .end() and return the doc). Finally, I'm not sure where you're getting your sample code, but it's NOT a good idea to dig in the source of a library for the functions you need. You should only use exposed methods. – kevintechie Feb 14 '22 at 19:21
  • I understand the changes that you put in your edited post recently, but has the same result that I mencioned. It takes to much time to delivered the pdf. With a 280kg pdf takes 13 seconds. – David Biagiola Feb 14 '22 at 20:10
  • If I send the whole file (280kg) as a whole, as normal, client get the file intermediately. I tried with a 2mb pdf and takes forever to download it. I was thinking in the highWaterMark of the res or something like that, some bottleneck is somewhere, I dont know really :s – David Biagiola Feb 14 '22 at 20:21
  • You can't compare transfer of a static file with that of a file created on the fly. If you want instant transfer, you could always pre-render the pdf files if the content supports that scenario. You could also try pre-rendering a portion of the document and just inserting the dynamic content. You'll have to test this scenario for speed. It could be just as slow. – kevintechie Feb 15 '22 at 17:27
  • In the two cases were I mencioned, (1 - sending the file as a whole (after push each chunk in a string and sending as the row pdf) and 2 - streaming with pipe) I'm creating the pdf file on the fly. – David Biagiola Feb 15 '22 at 21:09
  • PdfMake is the problem 2x text -> 40x slowdown! https://github.com/bpampuch/pdfmake/issues/280 I read that issue, but it's not clear for me. I think that is better to use another backend (e.g. java) for make pdfs and use my Main Node backend as a channel for delivered the chunked data to the final client. – David Biagiola Feb 15 '22 at 21:22