I would like to implement the library PocketSphinx in my Android project but I fail with it since nothing happens. It doesn't work and I don't get any errors.
This is how I tried it:
- Added pocketsphinx-android-5prealpha-release.aar to
/app/libs
- Added assets.xml to
/app
- Aded the following to
/app/build.gradle
:
ant.importBuild 'assets.xml'
preBuild.dependsOn(list, checksum)
clean.dependsOn(clean_assets)
- Added sync (with all sub-files) into
/app/assets
- Cloned the following repos into my root-directory:
git clone https://github.com/cmusphinx/sphinxbase
git clone https://github.com/cmusphinx/pocketsphinx
git clone https://github.com/cmusphinx/pocketsphinx-android
- Executed
gradle build
This is how my code looks like:
import android.app.Service;
import android.content.Intent;
import android.os.AsyncTask;
import android.os.IBinder;
import android.util.Log;
import androidx.annotation.Nullable;
import java.io.File;
import java.io.IOException;
import java.lang.ref.WeakReference;
import java.util.HashMap;
import ch.yourclick.kitt.R;
import edu.cmu.pocketsphinx.Assets;
import edu.cmu.pocketsphinx.Hypothesis;
import edu.cmu.pocketsphinx.RecognitionListener;
import edu.cmu.pocketsphinx.SpeechRecognizer;
import edu.cmu.pocketsphinx.SpeechRecognizerSetup;
public class SttService extends Service implements RecognitionListener {
private static final String TAG = "SstService";
/* Named searches allow to quickly reconfigure the decoder */
private static final String KWS_SEARCH = "wakeup";
private static final String FORECAST_SEARCH = "forecast";
private static final String DIGITS_SEARCH = "digits";
private static final String PHONE_SEARCH = "phones";
private static final String MENU_SEARCH = "menu";
/* Keyword we are looking for to activate menu */
private static final String KEYPHRASE = "oh mighty computer";
/* Used to handle permission request */
private static final int PERMISSIONS_REQUEST_RECORD_AUDIO = 1;
private SpeechRecognizer recognizer;
private HashMap<String, Integer> captions;
public SttService() {
// Prepare the data for UI
captions = new HashMap<>();
captions.put(KWS_SEARCH, R.string.kws_caption);
captions.put(MENU_SEARCH, R.string.menu_caption);
captions.put(DIGITS_SEARCH, R.string.digits_caption);
captions.put(PHONE_SEARCH, R.string.phone_caption);
captions.put(FORECAST_SEARCH, R.string.forecast_caption);
Log.e(TAG, "SttService: Preparing the recognition");
// Recognizer initialization is a time-consuming and it involves IO,
// so we execute it in async task
new SetupTask(this).execute();
}
private static class SetupTask extends AsyncTask<Void, Void, Exception> {
WeakReference<SttService> activityReference;
SetupTask(SttService activity) {
this.activityReference = new WeakReference<>(activity);
}
@Override
protected Exception doInBackground(Void... params) {
try {
Assets assets = new Assets(activityReference.get());
File assetDir = assets.syncAssets();
activityReference.get().setupRecognizer(assetDir);
} catch (IOException e) {
return e;
}
return null;
}
@Override
protected void onPostExecute(Exception result) {
if (result != null) {
Log.e(TAG, "onPostExecute: Failed to init recognizer " + result);
} else {
activityReference.get().switchSearch(KWS_SEARCH);
}
}
}
@Override
public void onDestroy() {
super.onDestroy();
if (recognizer != null) {
recognizer.cancel();
recognizer.shutdown();
}
}
@Nullable
@Override
public IBinder onBind(Intent intent) {
return null;
}
/**
* In partial result we get quick updates about current hypothesis. In
* keyword spotting mode we can react here, in other modes we need to wait
* for final result in onResult.
*/
@Override
public void onPartialResult(Hypothesis hypothesis) {
if (hypothesis == null)
return;
String text = hypothesis.getHypstr();
if (text.equals(KEYPHRASE))
switchSearch(MENU_SEARCH);
else if (text.equals(DIGITS_SEARCH))
switchSearch(DIGITS_SEARCH);
else if (text.equals(PHONE_SEARCH))
switchSearch(PHONE_SEARCH);
else if (text.equals(FORECAST_SEARCH))
switchSearch(FORECAST_SEARCH);
else
Log.e(TAG, "onPartialResult: " + text);
}
/**
* This callback is called when we stop the recognizer.
*/
@Override
public void onResult(Hypothesis hypothesis) {
if (hypothesis != null) {
String text = hypothesis.getHypstr();
Log.e(TAG, "onResult: " + text);
}
}
@Override
public void onBeginningOfSpeech() {
}
/**
* We stop recognizer here to get a final result
*/
@Override
public void onEndOfSpeech() {
if (!recognizer.getSearchName().equals(KWS_SEARCH))
switchSearch(KWS_SEARCH);
}
private void switchSearch(String searchName) {
recognizer.stop();
// If we are not spotting, start listening with timeout (10000 ms or 10 seconds).
if (searchName.equals(KWS_SEARCH))
recognizer.startListening(searchName);
else
recognizer.startListening(searchName, 10000);
String caption = getResources().getString(captions.get(searchName));
Log.e(TAG, "switchSearch: "+ caption);
}
private void setupRecognizer(File assetsDir) throws IOException {
// The recognizer can be configured to perform multiple searches
// of different kind and switch between them
recognizer = SpeechRecognizerSetup.defaultSetup()
.setAcousticModel(new File(assetsDir, "en-us-ptm"))
.setDictionary(new File(assetsDir, "cmudict-en-us.dict"))
.setRawLogDir(assetsDir) // To disable logging of raw audio comment out this call (takes a lot of space on the device)
.getRecognizer();
recognizer.addListener(this);
/* In your application you might not need to add all those searches.
They are added here for demonstration. You can leave just one.
*/
// Create keyword-activation search.
recognizer.addKeyphraseSearch(KWS_SEARCH, KEYPHRASE);
// Create grammar-based search for selection between demos
File menuGrammar = new File(assetsDir, "menu.gram");
recognizer.addGrammarSearch(MENU_SEARCH, menuGrammar);
// Create grammar-based search for digit recognition
File digitsGrammar = new File(assetsDir, "digits.gram");
recognizer.addGrammarSearch(DIGITS_SEARCH, digitsGrammar);
// Create language model search
File languageModel = new File(assetsDir, "weather.dmp");
recognizer.addNgramSearch(FORECAST_SEARCH, languageModel);
// Phonetic search
File phoneticModel = new File(assetsDir, "en-phone.dmp");
recognizer.addAllphoneSearch(PHONE_SEARCH, phoneticModel);
}
@Override
public void onError(Exception error) {
Log.e(TAG, "onError: " + error.getMessage());
}
@Override
public void onTimeout() {
switchSearch(KWS_SEARCH);
}
}
My code is almost the same as pocketsphinx-android-demo. The only differences are that I am doing this in a service class, instead of an Activity and I am not asking the user for microphone permission since I do that in the MainActity already. Well, my code has some warnings but no errors.
When I run my app, I get this message (see the full stack trace):
E/SstService: switchSearch: To start demonstration say "oh mighty computer".
But when I say "oh mighty computer" (or anything else), nothing happens. I don't even get an error. So I have no idea where I am stuck and what I am doing wrong.
If there is someone familiar with that library, any help will be appreciated!