Combining Object Detection with Text to Speech Code

Question

I am trying to write an object detection + text-to-speech code to detect objects and produce a voice output on the raspberry pi 4. However, as of right now, I am trying to write a simple python script that incorporates both elements into a single .py file and preferably as a function. I will then run this script on the raspberry pi. I want to give credit to Murtaza's Workshop "Object Detection OpenCV Python | Easy and Fast (2020)" and https://pypi.org/project/pyttsx3/ for the Text to speech documentation for pyttsx3. I have attached the code below. I have tried running the program and I always keep getting errors with the Text to speech code (commented lines 33-36 for reference). I believe it is some looping error but I just can't seem to get the program to run continuously. For instance, if I run the code without the TTS part, it works fine. Otherwise, it runs for perhaps 3-5 seconds and suddenly stops. I am a beginner but highly passionate in computer vision, and any help is appreciated!

import cv2
#import pyttsx3

cap = cv2.VideoCapture(0)
cap.set(3, 640)
cap.set(4, 480)

classNames = []
classFile = 'coco.names'
with open(classFile,'rt') as f:
    classNames = [line.rstrip() for line in f]

configPath = 'ssd_mobilenet_v3_large_coco_2020_01_14.pbtxt'
weightsPath = 'frozen_inference_graph.pb'

net = cv2.dnn_DetectionModel(weightsPath, configPath)
net.setInputSize(320, 320)
net.setInputScale(1.0 / 127.5)
net.setInputMean((127.5, 127.5, 127.5))
net.setInputSwapRB(True)

while True:
    success, img = cap.read()
    classIds, confs, bbox = net.detect(img, confThreshold=0.45)
    if len(classIds) != 0:
        for classId, confidence, box in zip(classIds.flatten(), confs.flatten(), bbox):
            className = classNames[classId-1]
            #engine = pyttsx3.init()
            #str1 = str(className)
            #engine.say(str1 + "detected")
            #engine.runAndWait()
            cv2.rectangle(img, box, color=(0, 255, 0), thickness=2)
            cv2.putText(img, classNames[classId-1].upper(), (box[0]+10, box[1]+30),
                cv2.FONT_HERSHEY_COMPLEX, 1, (0, 255, 0), 2)
            cv2.putText(img, str(round(confidence * 100, 2)), (box[0]+200, box[1]+30),
                cv2.FONT_HERSHEY_COMPLEX, 1, (0, 255, 0), 2)
    cv2.imshow('Output', img)
    cv2.waitKey(1)

Here is a screenshot of my code 1

Here is a link to the download files needed to run code as well in case

Here is the error: /Users/venuchannarayappa/PycharmProjects/ObjectDetector/venv/bin/python /Users/venuchannarayappa/PycharmProjects/ObjectDetector/main.py

Traceback (most recent call last): File "/Users/venuchannarayappa/PycharmProjects/ObjectDetector/main.py", line 24, in

classIds, confs, bbox = net.detect(img, confThreshold=0.45)

cv2.error: OpenCV(4.5.4) /Users/runner/work/opencv-python/opencv-python/opencv/modules/imgproc/src/resize.cpp:4051: error: (-215:Assertion failed) !ssize.empty() in function 'resize'

Process finished with exit code 1

Link to video output recorded through iphone: https://www.icloud.com/iclouddrive/03jGfqy7-A9DKfekcu3wjk0rA#IMG_4932

Sorry for such a long post! I was debugging my code for the past few hours and I think I got it to work. I changed the main while loop only and rest of code is the same. The program seems to run continuously for me. I would appreciate any comments if there are any difficulties in running it.

engine = pyttsx3.init()
while True:
    success, img = cap.read()
    #print(success)
    #print(img)
    #print(img.shape)
    classIds, confs, bbox = net.detect(img, confThreshold=0.45)
    if len(classIds) != 0:
        for classId, confidence, box in zip(classIds.flatten(), confs.flatten(), bbox):
            className = classNames[classId - 1]
            #print(len(classIds))
            str1 = str(className)
            #print(str1)
            engine.say(str1 + "detected")
            engine.runAndWait()
            cv2.rectangle(img, box, color=(0, 255, 0), thickness=2)
            cv2.putText(img, classNames[classId-1].upper(), (box[0]+10, box[1]+30),
                cv2.FONT_HERSHEY_COMPLEX, 1, (0, 255, 0), 2)
            cv2.putText(img, str(round(confidence * 100, 2)), (box[0]+200, box[1]+30),
                cv2.FONT_HERSHEY_COMPLEX, 1, (0, 255, 0), 2)
        continue
    cv2.imshow('Output', img)
    cv2.waitKey(1)

I am planning to run this code on the raspberry pi. I am planning on installing opencv using this command: pip3 install opencv-python. However, I am not sure how to install pyttsx3 since I think I need to install from source. Please let me know if there is a simple method to install pyttsx3.

Update: As of December 27th, I have installed all necessary packages and my code is now functional.

`I always keep getting errors with the Text to speech code` what errors? You need to post the full stacktrace here of any errors that you get. — Random Davis, Nov 26 '21 at 19:35
I posted the error I get below /Users/venuchannarayappa/PycharmProjects/ObjectDetector/venv/bin/python /Users/venuchannarayappa/PycharmProjects/ObjectDetector/main.py Traceback (most recent call last): File "/Users/venuchannarayappa/PycharmProjects/ObjectDetector/main.py", line 24, in classIds, confs, bbox = net.detect(img, confThreshold=0.45) cv2.error: OpenCV(4.5.4) /Users/runner/work/opencv-python/opencv-python/opencv/modules/imgproc/src/resize.cpp:4051: error: (-215:Assertion failed) !ssize.empty() in function 'resize' Process finished with exit code 1 — LovePlayingChess, Nov 26 '21 at 21:11
On this site, you need to put any errors or code you have into the question itself, not into a comment, since comments don't have formatting and they have limited space. — Random Davis, Nov 26 '21 at 21:13
Is your camera working? Is "success" true? What is the value of "img.shape"? I thought .read() returns None when it fails, but now I'm not that sure. — Michal Hradiš, Nov 26 '21 at 21:35
Sorry for late response. Yes, the camera is working since I receive a pop-up with objects being detected with my webcam. However, the code is not continuous for some reason and exits the loop. Therefore, the program detects and speaks the names of the first 3-4 objects perfectly but then suddenly gives the output "Process finished with exit code 0". I believe "Success" is true. I wrote the line: print(success) and output is True. I have attached a video of the output in the response. img.shape is a tuple with 3 values (480,640,3) — LovePlayingChess, Nov 28 '21 at 05:00
I followed the video https://www.youtube.com/watch?v=AWhDDl-7Iis&ab_channel=AiPhile to install pyttsx3. My functional code should also be listed above. — LovePlayingChess, Dec 27 '21 at 16:50

LovePlayingChess · Accepted Answer · 2021-12-28T16:46:08.890

0

I installed pyttsx3 using the two commands in the terminal on the Raspberry Pi:

sudo apt update && sudo apt install espeak ffmpeg libespeak1
pip install pyttsx3

I followed the video youtube.com/watch?v=AWhDDl-7Iis&ab_channel=AiPhile to install pyttsx3. My functional code should also be listed above. My question should be resolved but hopefully useful to anyone looking to write a similar program. I have made minor tweaks to my code.

edited Dec 28 '21 at 16:46

answered Dec 27 '21 at 16:51

LovePlayingChess

33
3

While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - [From Review](/review/late-answers/30683390) – Flair Dec 27 '21 at 20:31
sorry about that. I will reference the code directly. – LovePlayingChess Dec 28 '21 at 16:43

Combining Object Detection with Text to Speech Code

1 Answers1