5

I have a Google Home Mini and I'm trying to use it as a speech-to-text device. The way I intend to do so is by having the device listening to what is said and publishing that input to an MQTT broker in order to my application to listen to it.

I have found this, that returns the input as text, but all it gives me is the certainty I can get this data. I have little to no clue on how to make it publish this data as an MQTT message.

Also found this, but can't make it work, because it states "There’s a very easy way to recognize custom phrases in Google Assistant,[...] I won’t cover it here". And even the Google's instructions (open "Create an Applet") seems to be out-dated in relation to IFTTT, because the steps simply aren't followable in IFTTT's interface.

Here is a quick sketch of the architecture: enter image description here

There're 5 arrows. The first one is, obviously, a physical process. Arrows "Audio" and "Text" are automatically done by the hardware. The right "MQTT Message" is working already. So what I wanted help with is the "MQTT Message" arrow from "Google Home" to "MQTT Broker".

Thanks in advance.

Pedro Alves
  • 185
  • 1
  • 10
  • 1
    I think the architecture you are looking for is user->Google Home (Voice) ... and then Google Home to Actions on Google (on Cloud). Actions on Google will then parse the audio and you can call a WebHook application via DialogFlow. This will be passed the text and perform an action. At the end, DialogFlow will return a language string that will be returned as audio to Google Home ... end of story. So no call from Google Home via MQTT but you could call from WebHook on GCP to MQTT or directly to other app. – Kolban Jan 16 '20 at 03:30
  • If I understand correctly, you are assuming I want to use Google Home perform an action in some device. The think is: all I want from it is speech-to-text for my application. Of course, I could use any mic and the Cloud API to do so, but my application requires a Google Home anyways and it's not meant to require a mic in order to work. I hope I'm making myself clear haha – Pedro Alves Jan 16 '20 at 05:36
  • 1
    Howdy Pedro. Think of your Google Home device as a microphone and speaker and thats about it (for this story). You can write applications which receive audio input and perform "logic" based on what you say. The programming for that is performed through Actions on Google and runs in the cloud. What I think I am hearing is that you want to utter some phrase/trigger and have some back-end application work on what you said. The Google Home device will receive audio, send it to Actions on Google which will parse the audio and call an app with the text. All in the cloud. – Kolban Jan 16 '20 at 05:45
  • I think it could work for me. But how would the data (the text) reach the application? You see, the "application" is, in fact, a Lua script that works inside the _actual_ application that is a digital TV application. So, this Lua script is already capable of receiving (and treating) MQTT messages. Of course, in this new approach it wouldn't happen, but what _would_ happen? How would I send the data to this script? – Pedro Alves Jan 16 '20 at 05:56

2 Answers2

4

The short answer to this is you don't (as you've described it).

The slightly longer answer is that you first have to move the arrow you are interested into to the cloud and it's not a MQTT message.

enter image description here

The Action box needs to be hosted on a publicly accessable machine (e.g. AWS/GCP/Azure/IBM Cloud) so that the Google platform knows where to find it.

Google have 2 different types of actions, one for conversational type interactions and one for controlling smart homes devices. You've not mentioned what you are trying to do so I can't say which one you really want.

Google have recently announced the Local SDK for interacting with smart home devices that is slightly closer to the diagram you have included. This can only be used for device control and still can't send MQTT messages, it supports HTTP, raw UDP or TCP (you might be able to implement a MQTT client using the raw TCP, but it would be a lot of work and I'm not convinced the keep alive would work)

hardillb
  • 54,545
  • 11
  • 67
  • 105
  • I like this. It really seems to be what I was looking for, considering the bridge between Google Home and MQTT Broker wasn't clear for me AT ALL. Could you provide some more insights on how to configure such Action so that it can retrive the text and publish it via MQTT? I don't want to control any other smart home device, so I suppose it is the conversational type. I guess you would have a better idea of what I'm trying to accomplish by reading the comments of the question. Thanks! – Pedro Alves Jan 16 '20 at 10:20
  • You will have to build an application using the [Actions SDK](https://developers.google.com/assistant/conversational/overview#fulfillment_using_actions_sdk) and host this somewhere. – hardillb Jan 16 '20 at 10:24
  • Thank you. This is a great answer. I also like that it mentions Local SDK. I wonder though if the cloud fulfilment is needed after it has been used to setup the device? E.g. can I host a server somewhere, register my device, configure local fulfilment, shut that server down and still be able to control my lights? Will such setup survive Google Home device reboots? What about factory resets? Or does the server need to be reachable by Google servers after the initial setup as well? – Sergiy Belozorov Oct 05 '20 at 22:50
  • No, you still need the cloud service to respond to the Status request messages and to update the Home Graph. – hardillb Oct 06 '20 at 07:57
0

I think I got what you need:

  1. Configure the Google assistant to parse your speech, then connect it to ifttt (as I already did it in the past, it's very easy) to send HTTP requests.
  2. NOW create a local web server that understands these requests from ifttt, and publish them to your broker.

And that's all!

colidyre
  • 4,170
  • 12
  • 37
  • 53