Transcribe a text from speech and generate Audio from text

Question

I am trying to make a user interface where a user would click a mic button and speak to it and then a live stream transcription would appear on the screen. I also need to generate an audio and play it to the user as a reply to the user's input audio. There are already packages who do that(just need to npm install them), but I was wondering if I should use them or use AWS-SDK POLLY and client-transcribe-streaming packages offered by amazon web services. The already existing packages seem very easy to use but am not sure if that is reliable. Where as if I use AWS-SDK,it seems like setting up the mic and speaker of the browser and setting up the environment seems very complicated as I was trying it. The following code shows AWS-SDK method of transcribing live speaking streams but I couldn't see the transcribed texts. Is there something wrong with the code? Any suggestion would help

import { SECRET_ACCESS_KEY, ACCESS_KEY_ID } from "./transcribeGlobal.js";
import React, { useState } from "react";
import { TranscribeStreamingClient } from "@aws-sdk/client-transcribe-streaming";
import { MicrophoneStream } from "microphone-stream";

const accessKeyId = ACCESS_KEY_ID;
const secretAccessKey = SECRET_ACCESS_KEY;
const region = "us-east-1";
const languageCode = "en-US";

const transcribeClient = new TranscribeStreamingClient({
  region,
  credentials: { accessKeyId, secretAccessKey },
});

const request = {
  AudioStream: {},
  LanguageCode: languageCode,
  MediaEncoding: "pcm",
  SampleRateHertz: 44100,
};

function TranscribeClientSpeech() {
  const [transcription, setTranscription] = useState("");

  const handleStream = (stream) => {
    const micStream = new MicrophoneStream(stream);

    micStream.on("data", (chunk) => {
      const audioChunk = new TextEncoder().encode(chunk);
      request.AudioStream = { AudioEvent: { AudioChunk: audioChunk } };
      transcribeClient.send(request);
    });

    transcribeClient.on("data", (response) => {
      const transcript = response.TranscriptEvent.Transcript.Results.reduce(
        (acc, result) => acc + result.Alternatives[0].Transcript,
        ""
      );
      setTranscription(transcript);
    });

    transcribeClient.on("error", (err) => {
      console.error("Error with transcription stream", err);
    });
  };
  console.log(transcription)
  return (
    <div>
      <h1>Live Transcription</h1>
      <p>{transcription}</p>
      <button onClick={() => navigator.mediaDevices.getUserMedia({ audio: true }).then(handleStream)}>
        Start Transcription
      </button>
      <p>{transcription}</p>
    </div>
  );
}

export default TranscribeClientSpeech;

Thanks for the suggestions in advance

I am expecting for the transcribed texts to show up, but it's still empty

Doing "live" transcription via Amazon Polly is quite complex. Plus, it involves sending data up to the cloud and getting asynchronous responses. If you can install a local package that does what you need, then I would recommend that as a simpler solution. — John Rotenstein, Mar 17 '23 at 10:56
I agree with what @JohnRotenstein said. I was facing a similar issue. So I used a package called `microphone-stream` to stream the audio to AWS to get the transcription and the browser native API to store the audio as well. You can take a look at the Repo here. `https://github.com/yashjais/AWS-Audio-Transcribe` — Yash, Mar 17 '23 at 12:52

Transcribe a text from speech and generate Audio from text

0 Answers0