0

I am trying to make a user interface where a user would click a mic button and speak to it and then a live stream transcription would appear on the screen. I also need to generate an audio and play it to the user as a reply to the user's input audio. There are already packages who do that(just need to npm install them), but I was wondering if I should use them or use AWS-SDK POLLY and client-transcribe-streaming packages offered by amazon web services. The already existing packages seem very easy to use but am not sure if that is reliable. Where as if I use AWS-SDK,it seems like setting up the mic and speaker of the browser and setting up the environment seems very complicated as I was trying it. The following code shows AWS-SDK method of transcribing live speaking streams but I couldn't see the transcribed texts. Is there something wrong with the code? Any suggestion would help

import { SECRET_ACCESS_KEY, ACCESS_KEY_ID } from "./transcribeGlobal.js";
import React, { useState } from "react";
import { TranscribeStreamingClient } from "@aws-sdk/client-transcribe-streaming";
import { MicrophoneStream } from "microphone-stream";

const accessKeyId = ACCESS_KEY_ID;
const secretAccessKey = SECRET_ACCESS_KEY;
const region = "us-east-1";
const languageCode = "en-US";

const transcribeClient = new TranscribeStreamingClient({
  region,
  credentials: { accessKeyId, secretAccessKey },
});

const request = {
  AudioStream: {},
  LanguageCode: languageCode,
  MediaEncoding: "pcm",
  SampleRateHertz: 44100,
};

function TranscribeClientSpeech() {
  const [transcription, setTranscription] = useState("");

  const handleStream = (stream) => {
    const micStream = new MicrophoneStream(stream);

    micStream.on("data", (chunk) => {
      const audioChunk = new TextEncoder().encode(chunk);
      request.AudioStream = { AudioEvent: { AudioChunk: audioChunk } };
      transcribeClient.send(request);
    });

    transcribeClient.on("data", (response) => {
      const transcript = response.TranscriptEvent.Transcript.Results.reduce(
        (acc, result) => acc + result.Alternatives[0].Transcript,
        ""
      );
      setTranscription(transcript);
    });

    transcribeClient.on("error", (err) => {
      console.error("Error with transcription stream", err);
    });
  };
  console.log(transcription)
  return (
    <div>
      <h1>Live Transcription</h1>
      <p>{transcription}</p>
      <button onClick={() => navigator.mediaDevices.getUserMedia({ audio: true }).then(handleStream)}>
        Start Transcription
      </button>
      <p>{transcription}</p>
    </div>
  );
}

export default TranscribeClientSpeech;

Thanks for the suggestions in advance

I am expecting for the transcribed texts to show up, but it's still empty

JustDev
  • 37
  • 4
  • Doing "live" transcription via Amazon Polly is quite complex. Plus, it involves sending data up to the cloud and getting asynchronous responses. If you can install a local package that does what you need, then I would recommend that as a simpler solution. – John Rotenstein Mar 17 '23 at 10:56
  • I agree with what @JohnRotenstein said. I was facing a similar issue. So I used a package called `microphone-stream` to stream the audio to AWS to get the transcription and the browser native API to store the audio as well. You can take a look at the Repo here. `https://github.com/yashjais/AWS-Audio-Transcribe` – Yash Mar 17 '23 at 12:52

0 Answers0