3

I am reading data from a stream in NodeJS and then processing that data using an async function in a transform stream. I would like this transform stream to initiate several calls to the async function in parallel but it seems to do it one at a time.

To illustrate my expectations I have written a small program below that generates numbers from 0 up to limit - 1 and then passes that through a transform stream that increments each number with a small delay. If you run the program below, the numbers 1 to 20 will be logged in sequence, all with a small delay.

I would have expected them to be logged in chunks of 16 + 4 since the default highWaterMark is 16. Is it possible to get the behavior I want and if so, how?

I.e. the read stream will generate data very fast, the transform is slower but should receive up to the high water mark and then wait util its data has been processed, then ask for more from the read stream.

const stream = require('stream')
const limit = 20
let index = 0

const numberStream = new stream.Readable({
  objectMode: true,
  read (amount) {
    const innerLimit = Math.min(index + amount, limit)
    while (index < innerLimit) {
      this.push(index++)
    }
    if (index === limit) {
      this.push(null)
    }
  },
})

const delayedIncStream = new stream.Transform({
  objectMode: true,
  transform (item, _, cb) {
    setTimeout(() => cb(null, item + 1), 100)
  },
})

const resultStream = numberStream.pipe(delayedIncStream)

resultStream.on('data', console.log)
Ludwig Magnusson
  • 13,964
  • 10
  • 38
  • 53

2 Answers2

2

The answer is no as explained in the last part of this section of the documentation: https://nodejs.org/api/stream.html#stream_transform_transform_chunk_encoding_callback

transform._transform() is never called in parallel; streams implement a queue mechanism, and to receive the next chunk, callback must be called, either synchronously or asynchronously.

Ludwig Magnusson
  • 13,964
  • 10
  • 38
  • 53
2

You could use the nodejs package parallel-transform-stream to achieve just that while preserving the order of the transformed data.

Your example could then be rewritten as follows to transform all numbers in parallel:

const stream = require('stream')
const ParallelTransform = require('parallel-transform-stream').default
const limit = 20
let index = 0

const numberStream = new stream.Readable({
  objectMode: true,
  read (amount) {
    const innerLimit = Math.min(index + amount, limit)
    while (index < innerLimit) {
      this.push(index++)
    }
    if (index === limit) {
      this.push(null)
    }
  },
})

const delayedIncStream = new (ParallelTransform.create((item, _, cb) => {
  setTimeout(() => cb(null, item + 1), 100)
}))({
  objectMode: true,
  maxParallel: 20
})

const resultStream = numberStream.pipe(delayedIncStream)

resultStream.on('data', console.log)
whY
  • 189
  • 1
  • 10
  • solution can also be implemented with much more robust [parallel-transform](https://www.npmjs.com/package/parallel-transform) module on npm. – Gagan Sep 08 '22 at 04:03