Python- voice detection using Tensorflow

Question

I am making voice detection by using Tensorflow. The computer will be recording my voice and when I say:" Hey Jarvis" it will print "voice has been detected". However, the program is not working properly, when I say "Hey Jarvis" it always prints: "voice not detected". I don't know how to fix it I have tried many ways but it not working. Hope someone can help me thanks.

Here is my code:

######## IMPORTS ##########
import sounddevice as sd
from scipy.io.wavfile import write
import librosa
import numpy as np
from tensorflow.keras.models import load_model

####### ALL CONSTANTS #####
fs = 44100
seconds = 2
filename = "C:\\Users\\adamn\Documents\\voice detection\\WakeWordDetection-master\\WakeWordDetection-master\\prediction.wav"
class_names = ["Wake Word NOT Detected", "Wake Word Detected"]

##### LOADING OUR SAVED MODEL and PREDICTING ###
model = load_model("C:\\Users\\adamn\Documents\\voice detection\\WakeWordDetection-master\\WakeWordDetection-master\\saved_model\\WWD2.h5")

print("Prediction Started: ")
i = 0
while True:
    print("Say Now: ")
    myrecording = sd.rec(int(seconds * fs), samplerate=fs, channels=2)
    sd.wait()
    write(filename, fs, myrecording)

    audio, sample_rate = librosa.load(filename)
    mfcc = librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=40)
    mfcc_processed = np.mean(mfcc.T, axis=0)

    prediction = model.predict(np.expand_dims(mfcc_processed, axis=0))
    if prediction[:, 1] > 0.10:
        print(f"Wake Word Detected for ({i})")
        print("Confidence:", prediction[:, 1])
        i += 1
    
    else:
        print(f"Wake Word NOT Detected")
        print("Confidence:", prediction[:, 0])

I said: "Hey Jarvis" four times but it cannot detect the word. Here is the terminal

PS C:\Users\adamn\Documents\voice detection\WakeWordDetection-master> & C:/Users/adamn/AppData/Local/Programs/Python/Python310/python.exe "c:/Users/adamn/Documents/voice detection/WakeWordDetection-master/WakeWordDetection-master/prediction.py"
2022-12-30 11:55:29.369866: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2022-12-30 11:55:29.370141: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-12-30 11:55:30.987283: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'nvcuda.dll'; dlerror: nvcuda.dll not found
2022-12-30 11:55:30.987446: W tensorflow/stream_executor/cuda/cuda_driver.cc:263] failed call to cuInit: UNKNOWN ERROR (303)
2022-12-30 11:55:30.990920: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: KhoaLaptop
2022-12-30 11:55:30.991644: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: KhoaLaptop
2022-12-30 11:55:30.992261: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Prediction Started: 
Say Now:
1/1 [==============================] - 0s 60ms/step
Wake Word NOT Detected
Confidence: [1.]
Say Now:
1/1 [==============================] - 0s 14ms/step
Wake Word NOT Detected
Confidence: [1.]
Say Now:
1/1 [==============================] - 0s 13ms/step
Wake Word NOT Detected
Confidence: [1.]
Say Now:
1/1 [==============================] - 0s 11ms/step
Wake Word NOT Detected
Confidence: [1.]
Say Now:

What is that model and how did you create it? The most likely reason is that the model is not any good - MFCC averaged over 2 seconds is not a good feature representation for keyword detection. — Jon Nordby, Dec 31 '22 at 20:08

score 0 · Answer 1 · answered Jan 03 '23 at 01:13

The model you used is not for the Voice Activity Detection. This model detects specific "Wake Word". If you want to use it for your word "Hey Jarvis", you must retrain the model. If you have already retrained it, the performance of model is bad and you can try to train it for few more times.

Python- voice detection using Tensorflow

1 Answers1