I'd like to know what the best solution would be for my problem.
We are currently looking to do keyword spotting without using Speech to Text / Keyword recognition due to accents and dialects.
We would like to listen to sound files that can be quite long, and then run it against a list of keywords to determine if those keywords exist. We can also do model training for those keywords to train our accents to potentially fit those models.
What would the best solution to this be? My boss' idea is to find similarity in a spectrograph, but I'm just not sure what the most effective way to approach this issue would be.
We mainly work in C# but willing to use any language to best solve our issue.
I tried using PocketSphinx but could not get that working properly, as it seems to still try do Speech to Text which wont work well, as our country has 11 languages each with different accents.