0

I'm trying to write an app that will listen to my computer audio and transcribe it using Google Speach Recognition.

I've been able to record the system sound using WasapiLoopbackCapture and I've been able to use google streaming recognition api with test files, but I was not able to merge the two togther.

When I stream the audio from the WasapiLoopbackCapture to google it doesn't return any result.

I've based my code on the google code sample at: https://github.com/GoogleCloudPlatform/dotnet-docs-samples/blob/9588cee6d96bfe484c8e189e9ac2f6eaa3c3b002/speech/api/Recognize/InfiniteStreaming.cs#L225

private WaveInEvent StartListening()
    {
        var waveIn = new WaveInEvent
        {
            DeviceNumber = 0,
            WaveFormat = new WaveFormat(SampleRate, ChannelCount)
        };
        waveIn.DataAvailable += (sender, args) =>
        _microphoneBuffer.Add(ByteString.CopyFrom(args.Buffer, 0, args.BytesRecorded));
        waveIn.StartRecording();
        return waveIn;
    }

And adjusted it to use the WasapiLoopbackCapture:

        private IDisposable StartListening()
    {

        var waveIn = new WasapiLoopbackCapture();
        //var waveIn = new WaveInEvent
        //{
            
        //    DeviceNumber = 0,
        //    WaveFormat = new WaveFormat(SampleRate, ChannelCount)
        //};

        SampleRate = waveIn.WaveFormat.SampleRate;
        ChannelCount = waveIn.WaveFormat.Channels;
        BytesPerSecond = SampleRate * ChannelCount * BytesPerSample;

        Console.WriteLine(SampleRate);
        Console.WriteLine(BytesPerSecond);
        waveIn.DataAvailable += (sender, args) =>
        _microphoneBuffer.Add(ByteString.CopyFrom(args.Buffer, 0, args.BytesRecorded));
        waveIn.StartRecording();
        return waveIn;
    }

But it doesn't return any transcribed text.

I've saved the input stream to a file, and it played ok - so the sound is getting there, my guess is that the waveFormat that is received from the WasapiLoopback is not compatible with what google likes - I tried some conversion and couldn't get it to work.

I've reviewed the following topics on stack overflow, but still couldn't get it to work: Resampling WasapiLoopbackCapture Naudio - Convert 32 bit wav to 16 bit wav

And tried combining them both:

private IDisposable StartListening()
    {
        
        var waveIn = new WasapiLoopbackCapture();
        //var waveIn = new WaveInEvent
        //{
        //DeviceNumber = 0,
        //WaveFormat = new WaveFormat(SampleRate, ChannelCount)
        //};


        //  SampleRate = waveIn.WaveFormat.SampleRate;
        //   ChannelCount = waveIn.WaveFormat.Channels;
        //  BytesPerSecond = waveIn.WaveFormat.AverageBytesPerSecond;// SampleRate * ChannelCount * BytesPerSample;

        var target = new WaveFormat(SampleRate, 16, 1);
        var writer = new WaveFileWriter(@"c:\temp\xx.wav", waveIn.WaveFormat);

        Console.WriteLine(SampleRate);
        Console.WriteLine(BytesPerSecond);
        var stop = false;
        waveIn.DataAvailable += (sender, args) =>
        {
            var a = args;
            byte[] newArray16Bit = new byte[args.BytesRecorded / 2];
            short two;
            float value;
            for (int i = 0, j = 0; i < args.BytesRecorded; i += 4, j += 2)
            {
                value = (BitConverter.ToSingle(args.Buffer, i));
                two = (short)(value * short.MaxValue);

                newArray16Bit[j] = (byte)(two & 0xFF);
                newArray16Bit[j + 1] = (byte)((two >> 8) & 0xFF);
            }
            var resampleStream = new NAudio.Wave.Compression.AcmStream(new WaveFormat(waveIn.WaveFormat.SampleRate
                ,16,waveIn.WaveFormat.Channels), target);
            Buffer.BlockCopy(newArray16Bit, 0, resampleStream.SourceBuffer, 0, a.BytesRecorded/2);
            int sourceBytesConverted = 0;
            var bytes = resampleStream.Convert(a.BytesRecorded/2, out sourceBytesConverted);
            var converted = new byte[bytes];
            Buffer.BlockCopy(resampleStream.DestBuffer, 9, converted,0, bytes);
            a = new WaveInEventArgs(converted,bytes);



            _microphoneBuffer.Add(ByteString.CopyFrom(a.Buffer, 0, a.BytesRecorded));
            if (writer != null)
            {
                writer.Write(a.Buffer, 0, a.BytesRecorded);
                if (writer.Position > waveIn.WaveFormat.AverageBytesPerSecond * 5)
                {
                    stop = true;
                    writer.Dispose();
                    writer = null;
                    Console.WriteLine("Saved file");
                }
            }
        };
        waveIn.StartRecording();
        return waveIn;
    }

But it doesn't work.

I'm not sure if this is the right path.

A code sample of a fix would be highly appreciated

I tried converting the bit rate etc.. but couldn't get this to work.

Noam
  • 4,472
  • 5
  • 30
  • 47
  • Hi, did you managed to find the solution? I have the exactly same problem and followed exactly same path but no luck either. – Shota Jan 20 '21 at 16:04
  • 1
    Yes - eventually - but it was so long that I don't exactly remember how. Here's the working solution project, peek into it and extract what you want: https://github.com/noam-honig/chat-translate-headset-extention – Noam Feb 26 '21 at 04:13
  • Thanks, I've also managed it same way you did, using CSCore instead of NAudio solves the problem as CSCore has native functionality for resampling the audio stream in the format google is expecting. For anybody looking the solution, use CSCore as provided in the link above. – Shota Feb 27 '21 at 20:46

0 Answers0