0

I'm trying to write a nodejs code that read (audio) files and stream the content to a remote service (dialogflow). I'm having trouble ensuring the order of the chunks sent to the stream. Most of the time everything seems to be in the right order, but once in a while, the data seems to be sent in an out-of-order fashion.

Pseudo code:

for (var i = 0; i < numFiles; ++i) {
  await sendData(fs.createReadStream(filenames[i]), i);
}

...

async function sendData(inputDataStream, chunkIndex) {
  await inputDataStream
    .pipe(new Transform({
      objectMode: true,
      transform: (obj, _, next) => {
        console.log('Sending chunk ' + chunkIndex);
        next(null, <some data>);                        
      }
    }), {end: false})
  .pipe(outputStream, {end: false});
}

I can see that 'Sending chunk ...' is printed out of order sometimes.

Q: is there a way to avoid this problem?

Another issue is that, while, most of the time, each chunk is sent contiguously, occasionally, some chunks will be split and sent in smaller sub-chunks (even though each file is not large). [I repeated this experiment many times on the same set of files]

Q: Is there a way I can control the chunk size? (what did I do wrong here?)

Q: Is this because the remote service cannot handle the rate of transmission? If so, how should I properly react to that?

[I have also tried using pump(), but still observed the same behavior] Thanks in advance.

thammaknot
  • 63
  • 7

1 Answers1

1

For Dialogflow, I have used the following pump method, and it is working fine.

await pump(
    fs.createReadStream(filename),
    new Transform({
      objectMode: true,
      transform: (obj, _, next) => {
        next(null, {inputAudio: obj});
      },
    }),
    detectStream
  );
}

Ref: link

I didn't face any issue with pump as of now. Also, I have come around one more use case, In which a WebSocket connection is used to receive audio from a streaming endpoint and then use that audio for intent detection. (I have used this one with both Dialogflow ES and CX). example for es:

function getDialogflowStream() {
    let sessionClient = new dialogflow.SessionsClient();
    let sessionPath = sessionClient.projectAgentSessionPath(
        projectId,
        sessionID,
    );
    // First Request 
    let initialStreamRequest = {
        session: sessionPath,
        queryInput: {
            audioConfig: {
                audioEncoding: encoding,
                sampleRateHertz: sampleRateHertz,
                languageCode: languageCode,
            },
            singleUtterance: true,
        },
    };
    const detectStream = sessionClient
        .streamingDetectIntent()
        .on('error', error => {
            console.error(error);
            writeFlag = false;
            detectStream.end();
        })
        .on('data', data => {
            if (data.recognitionResult) {
                console.log(
                    `Intermediate transcript: ${data.recognitionResult.transcript}`
                );
            } else {
                   console.log(
                    `Query results: ${data.queryResult}`
                ); 
            }
        });
    // Write the initial stream request to config for audio input.
    detectStream.write(initialStreamRequest);
    return detectStream;
}
const wss = new WebSocket.Server({
    port,
    handleProtocols: (protocols, req) => {
        return 'dialogflow.stream';
    }
});
wss.on('connection', (ws, req) => {
    console.log(`received connection from ${req.connection.remoteAddress}`);
    let dialogflowStreamer = getDialogflowStream();
    ws.on('message', (message) => {
        if (typeof message === 'string') {
            console.log(`received message: ${message}`);
            console.log(`UUID: ${calluuid}`);
        } else if (message instanceof Buffer) {
            // Transform message and write to detect
            dialogflowStreamer.write({ inputAudio: message });
        }
    });
    ws.on('close', (code, reason) => {
        console.log(`socket closed ${code}:${reason}`);
        dialogflowStreamer.end();
        sessionID = uuid.v4();
    });
});

One more thing make sure your sample rate and encoding in input configuration are the same as the audio file because I have faced issues when it's different.