Detect fluency from google speech api results

Question

I'm trying to determine the fluency of a speaker using google speech (to text) API.

So far I have found that the API (betav1) can show the time taken to speak a word ( its starting time and ending time ).

And from Wikipedia,

Oral fluency or speaking fluency is a measurement both of production and reception of speech, as a fluent speaker must be able to understand and respond to others in conversation. Spoken language is typically characterized by seemingly non-fluent qualities (e.g., fragmentation, pauses, false starts, hesitation, repetition) because of ‘task stress.’ How orally fluent one is can therefore be understood in terms of perception, and whether these qualities of speech can be perceived as expected and natural (i.e., fluent) or unusual and problematic (i.e., non-fluent)

I can see we can get the pause, repetition, etc from the API of a word. But relative measurement can be difficult as I can't find any standard values.

Is there any proper approach to achieve this? Can anyone give a guideline to detect the fluency from google API (or any other valid approach using some sort of open-source speech libraries or external software)

It's completely fine if I am going in completely the wrong direction, just need a proper guideline to achieve the feature.

@NikolayShmyrev, oh crap. it comes a little late for us. we already implemented using google speech. but damn this would have been so easy :( — Sadi Mahmud, Jun 28 '19 at 17:43
@SadiMahmud Can you pls ahre how you used Google speech to build a fluency test? — Anuj Gupta, Jan 13 '21 at 11:48
@NikolayShmyrev sheechace works only for native english speakers of US/UK — Anuj Gupta, Jan 18 '21 at 05:13

score 0 · Answer 1 · answered Jan 20 '21 at 02:33

It really depends on the data you have. I'm not familiar with the google text to speech API. However, there are a few alternative options to achieve what you want depending on the structure of the data.

If the data is structured (i.e. a table of words and values corresponding to properties of those words), you could run a classification (or regression) algorithm such as random forest or multiple logistic regression to either estimate the degree of fluency (on a continuous scale) or a category of fluency (e.g. very disfluent, somewhat disfluent, normal, somwhat fluent, very fluent).
If the data is unstructured (e.g. a recording of a phrase) - then you could try a neural network in keras/tensorflow that aims to classify different phrases as fluent or disfluent.

Detect fluency from google speech api results

1 Answers1

Linked