0

I need to add Speech to Text capability to an MS bot written in C#.

I'm new to C# (although I do know C++) and was wondering if I can use JS for the same. I'm quite familiar with JavaScript and have written Speech to Text module using SpeechSynthesis API for a bot that was written in Python.

Or is it better that I figure C# out? (I'd have to use another API for this, say Bing Speech API).

Do share your thoughts.

Parvathy Sarat
  • 395
  • 1
  • 6
  • 22

1 Answers1

2

Depending on what your doing there are few alternatives. Let's say you want to go predefined commands vs dictation.

Predefined commands would be used as a case statement or if statement.

The first thing you would do is reference the System.Speech. Once that is referenced. Then you activate the namespace using.

using System.Speech.Recognition;

Then you would declare some classes and variables.

SpeechRecognition  sr = new SpeechRecognition();

Then you can use a predefined text to hold commands.

Also you would need to set the input to default microphone. Then set the recognize state mode to multiple. And reference the txt command list to pull the string values.

You can also add the synthesis to the code as well and have your computer talk back to you.

**Note that free dictation would be the same process and with a little more code. Please see this for more examples on speech https://msdn.microsoft.com/en-us/library/office/hh361683(v=office.14).aspx

Halonic
  • 405
  • 2
  • 12
  • 1
    Also to get a full view of what you can do, look at this. My code is about 8,000 lines at the moment but this is all completely done in C#, with predefined commands with a little machine learning, as well as API XML calls so this is what you can do with speech in C# https://youtu.be/dFiiSaRBzgw – Halonic May 13 '18 at 14:32
  • Thanks a ton! But using Speech.Recognition (which I read is inbuilt within the .NET framework) would require training from my desktop, would it be possible for a client to run it? What would the difference be with using Speech to Text APIs from Microsoft services? – Parvathy Sarat May 14 '18 at 09:38
  • The system.Speech is just a reference so the computer wether desktop or laptop has the ability to have the listen and speech functions. As far as chatting and phrases, you can build your own system. For example. The text document can have the entire predefined commands you say, example "how is the weather" then in the code you would pull the XML of the weather. Now in theory the MS Speech is the basis of all speech applications. From there you can create SRGS XML speech files, or you can use the text based option. – Halonic May 14 '18 at 13:02
  • This is my field i do it every single day. MS Speech holds all the words from the 80's. What I mean is MS started developing the speech program. They seen it as a dead end. So developing came to a stop. If you wanted to create your own speech system you would need to record a .wav file of every single word known to man. The computer does not understand words but sound waves. That is what MS did. It's a db of sound waves. The same as when you speak and listen. Your brain hears the sound wave, and in nano seconds puts the sound with an image. That's how it works. – Halonic May 14 '18 at 13:07
  • As far as on clients computers, you will just have to use identifiers to point to the files. Such as this : system.environment.computerName; (I believe that is it). What this does is gets the computer name. Put as string user; Then you can do this: @"desktop\" + user + "\....rest of location" what it does is it keeps your path, and the user is "John Doe" (clients computer). All of the program begins at "\...rest of location" . So on a client computer it would look like this: @"c:\desktop\" + user + "\...rest of location"; – Halonic May 14 '18 at 13:14
  • Thanks a lot! One piece of information I missed out on is that it needs to be hosted on Azure. System.Speech is not supported on Azure, as I have found, so I would probably need to depend on Bing Speech API or REST API for this. – Parvathy Sarat May 15 '18 at 04:13
  • Bing Speech API is good for applications in my theory of a user usage of 5,000 or less. What I mean is if you make a call say a generic search "How to grow a beard" then for the month you have used 1 of 5,000 free calls per month. Now let's say you have 10,000 calls a month then you get 5,000 free then it's $4 per 1,000 calls. So you would have to pay $20 for the remaining calls. That can get real expensive if you have a big API Usage per month. 100,000 calls would cost $400 a month roughly (5,000 free). – Halonic May 15 '18 at 12:22
  • Also think about this, In theory MS Speech was developed in the SAPI-1 era 1995. The latest was 2005 which was SAPI-5. If you read this https://en.m.wikipedia.org/wiki/Microsoft_Speech_API this will back up my theory. Bing API was founded on MS Speech. They just developed more on it. You can build speech engines as long as it conforms to the structure. Now the interesting thing is, Bing API uses MS Speech and they also have an algorithm to record new phrases. That is added to the db which makes it superior to MS Speech. So in theory Bing and Google is derived from the plain MS Speech. – Halonic May 15 '18 at 12:31