I want to create an app which records what you say to the microphone and extract all the words.
I know this is a problem many companies and individuals are working on, but I am not quite sure how far we are from developing tools that are good at this.
Also, are there any publicly available tools to achieve this? I would hope there is an API provided by Google Assistant, Apple Siri or something that I can just use by uploading an audio clip and then acquiring the words said.