8

I'm writing up a Smart Home software for my bachelor's degree, that will only simulate the actual house, but I'm stuck at the NLP part of the project. The idea is to have the client listen to voice inputs (already done), transform it into text (done) and send it to the server, which does all the heavy lifting / decision making.

So all my inputs will be fairly short (like "please turn on the porch light"). Based on this, I want to take the decision on which object to act, and how to act. So I came up with a few things to do, in order to write up something somewhat efficient.

  1. Get rid of unnecessary words (in the previous example "please" and "the" are words that don't change the meaning of what needs to be done; but if I say "turn off my lights", "my" does have a fairly important meaning).
  2. Deal with synonyms ("turn on lights" should do the same as "enable lights" -- I know it's a stupid example). I'm guessing the only option is to have some kind of a dictionary (XML maybe), and just have a list of possible words for one particular object in the house.
  3. Detecting the verb and subject. "turn on" is the verb, and "lights" is the subject. I need a good way to detect this.
  4. General implementation. How are these things usually developed in terms of algorithms? I only managed to find one article about NLP in Smart Homes, which was very vague (and had bad English). Any links welcome.
desertnaut
  • 57,590
  • 26
  • 140
  • 166
Eduard Luca
  • 6,514
  • 16
  • 85
  • 137
  • how much time do you have to spend on the NLP bits? it might make sense to make use of tools already available for speech recognition like open source [sphinx](http://cmusphinx.sourceforge.net/). take a look at these [two](http://stackoverflow.com/questions/9406093/nltk-thinks-that-imperatives-are-nouns/9572724#9572724) [questions](http://stackoverflow.com/questions/18681052/) on imperatives (which assume that you've got the voice-to-text recognition done already). pattern.en might be especially helpful. also - I'm not clear what the question is for your 4th point. – arturomp Sep 10 '13 at 02:01
  • @amp I have about 1 year to go and I don't want to use already existing tools, because my project wouldn't have that complicated part that I need for getting a high grade :) I'd rather write something up myself, even if it takes me months to do so. – Eduard Luca Sep 11 '13 at 13:16

3 Answers3

6

If you don't have a lot of time to spend with the NLP problem, you may use the Wit API (http://wit.ai) which maps natural language sentences to JSON:

enter image description here

It's based on machine learning, so you need to provide examples of sentences + JSON output to configure it to your needs. It should be much more robust than grammar-based approaches, especially because the voice-to-speech engine might make mistakes that will break your grammar (but the machine learning module can still get the meaning of the sentence).

Blacksad
  • 14,906
  • 15
  • 70
  • 81
  • That's **exactly** what I want, however I can't use an external service, I want to create that service :) Do you have some tips on that? – Eduard Luca Sep 11 '13 at 13:12
3

For your project I would suggest you to go through Stanford Parser

  1. From your problem definition I guess you don't need anything other then verbs and nouns. SP generates POS(Part of speech tags) That you can use to prune the words that you don't require.

  2. For this I can't think of any better option then what you have in mind right now.

  3. For this again you can use grammatical dependency structure from SP and I am pretty much sure that it is good enough to tackle this problem.

  4. This is where your research part lies. I guess you can find enough patterns using GD and POS tags to come up with an algorithm for your problem. I hardly doubt that any algorithm would be efficient enough to handle every set of input sentence(Structured+unstructured) but something that is more that 85% accurate should be good enough for you.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Prateek
  • 1,916
  • 1
  • 12
  • 22
3

First, I would construct a list of all possible commands (not every possible way to say a command, just the actual function itself: "kitchen light on" and "turn on the light in the kitchen" are the same command) based on the actual functionality the smart house has available. I assume there is a discrete number of these in the order of no more than hundreds. Assign each some sort of identifier code.

Your job then becomes to map an input of:

  • a sentence of english text
  • location of speaker
  • time of day, day of week
  • any other input data

to an output of a confidence level (0.0 to 1.0) for each command.

The system will then execute the best match command if the confidence is over some tunable threshold (say over 0.70).

From here it becomes a machine learning application. There are a number of different approaches (and furthermore, approaches can be combined together by having them compete based on features of the input).

To start with I would work through the NLP book from Jurafsky/Manning from Stanford. It is a good survey of current NLP algorithms.

From there you will get some ideas about how the mapping can be machine learned. More importantly how natural language can be broken down into a mathematical structure for machine learning.

Once the text is semantically analyzed, the simplest ML algorithm to try first would be of the supervised ones. To generate training data have a normal GUI, speak your command, then press the corresponding command manually. This forms a single supervised training case. Make some large number of these. Set some aside for testing. It is also unskilled work so other people can help. You can then use these as your training set for your ML algorithm.

Andrew Tomazos
  • 66,139
  • 40
  • 186
  • 319
  • nice points! looking through Jurafsky/Manning + corpus creation + training is a whole project by itself, though! :P this is a principled approach, which I appreciate, but perhaps not the most straightforward? – arturomp Sep 10 '13 at 02:03
  • @amp: I think that a classic structured approach, where you try to manually write code that semantically analyzes the input text with a manually designed object model, mapping nouns to objects in the house and verbs to actions, etc, etc - will take much longer to implement and give worse results than a "statistical" ML approach. I am not sure if that's what you mean by the most straightforward way? – Andrew Tomazos Sep 10 '13 at 17:47
  • i agree that doing this manually would take work although I doubt it would perform that much worse given that, as you said, there is a finite number of cases! (and that there is likely not that much data to train on.) in any case, that's not my main point - what I mean is that there are already ML/NLP libraries out there that can be used, instead of trying to build and train your own after reading Jurafsky/Manning! (although reading it is great if you're _really_ going into NLP, not just using it as a component in a project.) – arturomp Sep 10 '13 at 18:22
  • @amp: Sure, what parts of the system you write yourself, and what parts you take off the shelf, I don't really speak to. I'm suggesting an overall system architecture, and if you can build it with third party libraries so be it. – Andrew Tomazos Sep 10 '13 at 18:27