2

I have to run a project involving NAO robot programmed in python. What I have to do is to assign some knowledge on what is shown to NAO.

For example:

  • A person shows NAO a picture (drawn by hand on a whiteboard)
  • The person says "House" (let's say the person draws a house)
  • NAO now knows that the picture shown represents a house

The problem I have encountered is in the speech recognition module. Only words in a certain vocabulary could be recognized. But in my project setting, a person should draw on a whiteboard and say to NAO what is drawn there. So, means I cannot know what the person is going to draw and I cannot set the vocabulary in advance.

My starting point is this tutorial here. As you can see by reading the tutorial, can be recognized only certain words belonging to the vocabulary, like in this line of code:

wordList=["yes","no","hello Nao","goodbye Nao"]
asr.setWordListAsVocabulary(wordList)

During the recognition, an event called WordRecognized is raised. It has this structure:

Event: "WordRecognized"
callback(std::string eventName, AL::ALValue value, std::string subscriberIdentifier)

It is raised when one of the specified words with ALSpeechRecognitionProxy::setWordListAsVocabulary() has been recognized. When no word is currently recognized, this value is reinitialized.

So I suppose the key of my answer is here, but I need an help. How could I solve this problem? Is there any better documentation I can refer to?

Thanks in advance!

Francesco Sgaramella
  • 1,009
  • 5
  • 21
  • 39

1 Answers1

2

The problem is that NAO speech recognition module is proprietary and I highly doubt you can do such things with it.

However, if you consider ROS platform and open source engine like CMUSphinx you can definitely do what you want. It's easy to include placeholder word to a grammar which will be matched against an unknown word and later be placed in the dictionary.

This is a highly complicated research question to learn the vocabulary by voice interaction, but it was done before. As an example you can read this publication

Combined systems for automatic phonetic transcription of proper nouns A. Laurent, T. Merlin , S. Meignier, Y. Esteve, P. Deleglise

http://www.lrec-conf.org/proceedings/lrec2008/pdf/455_paper.pdf

The only thing is that you want to work with the recognizer on the very low level.

Nikolay Shmyrev
  • 24,897
  • 5
  • 43
  • 87