How can I detect intent and get the agent response as audio as well as text from Dialogflow CX using the Java API?

Question

I'm trying to develop a simple voice bot using DialogFlow CX using the Java API.

These are my dependencies in a Spring Boot 2.4.3 project

...
        <dependency>
            <groupId>com.google.cloud</groupId>
            <artifactId>spring-cloud-gcp-starter</artifactId>
        </dependency>
        <dependency>
            <groupId>com.google.cloud</groupId>
            <artifactId>google-cloud-dialogflow-cx</artifactId>
            <version>0.6.1</version>
        </dependency>
...

I have used https://github.com/googleapis/java-dialogflow-cx as starting point and Everything seem to be working well so far.. except the thing that matter the most.

When I send a text or event to my agent, it detect the intent and I get the response but there is no audio output. So, it seem that no Text to Speech is performed.

The way i'm doing the request as in the documentation example:

QueryInput queryInput = QueryInput
                .newBuilder()
                .setLanguageCode("es-ES")
                .setText("hola")
                .build();

DetectIntentRequest request = DetectIntentRequest.newBuilder()
                .setSession(sessionName.toString())
                .setQueryInput(queryInput)
                .build();

DetectIntentResponse response = sessionsClient.detectIntent(request);

Response:

{
  "detectIntentResponse": {
    "text": "hola",
    "languageCode": "es",
    "responseMessages": [
      {
        "text": {
          "text": [
            "¡Buenos días!"
          ]
        }
      },
      {
        
      }
    ],
    "currentPage": {
      "name": "projects/test-project/locations/global/agents/9effb8aa-6b62-4fe6-9fd5-2f5e87265ee7/flows/00000000-0000-0000-0000-000000000000/pages/START_PAGE",
      "displayName": "Start Page"
    },
    "intent": {
      "name": "projects/test-project/locations/global/agents/9effb8aa-6b62-4fe6-9fd5-2f5e87265ee7/intents/00000000-0000-0000-0000-000000000000",
      "displayName": "Default Welcome Intent"
    },
    "intentDetectionConfidence": 1.0,
    "diagnosticInfo": {
      "Execution Sequence": [
        {
          "Step 1": {
            "InitialState": {
              "FlowState": {
                "Version": 0.0,
                "PageState": {
                  "Status": "ENTERING_PAGE",
                  "Name": "Start Page"
                },
                "Name": "Default Start Flow"
              },
              "MatchedIntent": {
                "Score": 1.0,
                "Type": "NLU",
                "Active": true,
                "DisplayName": "Default Welcome Intent",
                "Id": "00000000-0000-0000-0000-000000000000"
              }
            },
            "Type": "INITIAL_STATE"
          }
        },
        {
          "Step 2": {
            "Type": "STATE_MACHINE",
            "StateMachine": {
              "FlowState": {
                "Version": 0.0,
                "Name": "Default Start Flow",
                "PageState": {
                  "Name": "Start Page",
                  "Status": "TRANSITION_ROUTING"
                }
              },
              "TriggeredIntent": "Default Welcome Intent"
            }
          }
        },
        {
          "Step 3": {
            "FunctionExecution": {
              "Responses": [
                {
                  "text": {
                    "redactedText": [
                      "¡Buenos días!"
                    ],
                    "text": [
                      "¡Buenos días!"
                    ]
                  },
                  "responseType": "HANDLER_PROMPT",
                  "source": "VIRTUAL_AGENT"
                }
              ]
            },
            "Type": "FUNCTION_EXECUTION"
          }
        },
        {
          "Step 4": {
            "Type": "STATE_MACHINE",
            "StateMachine": {
              "FlowState": {
                "Version": 0.0,
                "PageState": {
                  "Name": "Start Page",
                  "Status": "TRANSITION_ROUTING"
                },
                "Name": "Default Start Flow"
              }
            }
          }
        }
      ],
      "Alternative Matched Intents": [
        {
          "Active": true,
          "Type": "NLU",
          "Id": "00000000-0000-0000-0000-000000000000",
          "DisplayName": "Default Welcome Intent",
          "Score": 1.0
        }
      ],
      "Transition Targets Chain": [
        
      ],
      "Triggered Transition Names": [
        "9db835de-3e94-4a2a-9b8d-4eda03039e5a"
      ]
    },
    "match": {
      "intent": {
        "name": "projects/test-project/locations/global/agents/9effb8aa-6b62-4fe6-9fd5-2f5e87265ee7/intents/00000000-0000-0000-0000-000000000000",
        "displayName": "Default Welcome Intent"
      },
      "resolvedInput": "hola",
      "matchType": "INTENT",
      "confidence": 1.0
    }
  }
}

In DialogFlow ES there is an option to enable Automatic Text to Speech and as result the Output Audio is included in the DetectIntentResponse, but I cant see any option like that in CX.

I have done several searches in google and I'm unable to find anything useful.

So the question is: How can I detect intent and get the agent response as audio as well as text from Dialogflow CX using the Java API?

A sample code should be great!

Thank you in advance!

score 2 · Answer 1 · answered Mar 16 '21 at 18:24

According to the documentation

"If the client wants to receive an audio response, it should also contain output_audio_config."

Even if I am not using SteamingDetectIntent, in order to receive audio in the response, "OutputAudioConfig" must be added.

The code would then should be something like:

DetectIntentRequest request = DetectIntentRequest.newBuilder()
                .setSession(sessionName.toString())
                .setQueryInput(queryInput)
                .setAudioEncoding(
                   OutputAudioEncoding.OUTPUT_AUDIO_ENCODING_MP3)
                        .build())
                .build();

DetectIntentResponse response = sessionsClient.detectIntent(request);

And the response will contain also the outputAudio i was looking for.

{
  "outputAudio": "//NExAAAAANIAAAAALYwEAA......THE AUDIO ...ngAUYYAP/",
  "outputAudioConfig": {
    "audioEncoding": "OUTPUT_AUDIO_ENCODING_MP3"
  }
}

I hope it will be useful to someone.

Thank you!

How can I detect intent and get the agent response as audio as well as text from Dialogflow CX using the Java API?

1 Answers1