2

So I have a stream that generates data and one that writes them to the database. Writing to the databse is slow. I use the writev function to write a batch of 3000 chunk at once.

const generator = new DataGenerator(); // extends Readable
const dbWriter = new DBWriter({ highWaterMark: 3000 }); // extends Writable, implements _writev method

pipeline(
  generator,
  dbWriter 
)

But when I log chunk counts in the _writev method, I get following output:

1
2031
969
1
1635
1365
1
1728
1272
1
...

I understand the first line is 1. A chunk comes, DB starts writing. 2031 chunks come in the meantime.

Then DB starts writing the 2031 chunks and another 969 chunks come in the meantime, not 3000. And then in the next step, only 1 is written again. Like if receiving chunks to buffer would reset only when everything is written, not when the 3000 buffer is not full.

What I would expect:

1
2031
3000
3000
3000
...
3000
123

Why?

SmallhillCZ
  • 152
  • 10

1 Answers1

0

Well because there is no warranty that you will get 3000 chunk of data, it does tell the limit of the inner buffer that your writable stream has. It is okay that you can receive an arbitrary amount of data because read stream knows nothing about your buffer size. Best regards.

Ayzrian
  • 2,279
  • 1
  • 7
  • 14
  • But writing 969 chunks of data definitely must take more time then to generate the one chunk, that is later processed, why aren't more chunks buffered in the meantime? Writing the 1 chunk every three loops actually slows the writing a lot. – SmallhillCZ May 07 '21 at 12:50
  • That is the question to the implementation details of the readable stream that you are using. If you are using some NPM package you should better submit an issue at their GitHub repository. – Ayzrian May 07 '21 at 12:54
  • It is a HTTP stream actually and it seems to provide data whenever asked, so it must be stopped by the backpressure. But it looks like the DDBWriter restarts receiving data only when everything has been written, instead of when the buffer is not full anymore. E.g. when it starts writing to DB it should start accepting data to buffer, but instead it waits for everything to be written. And then, logically it tries to write the first chunk received, hence the 1 chunk written. – SmallhillCZ May 07 '21 at 13:01