I'm working on a hand tracking script that utilizes MediaPipe for hand landmark detection and gesture recognition. I want to optimize the script to run on my GPU for faster performance. However, I'm encountering a couple of issues.
Checking GPU Allocation: When I run the code snippet import cv2.cuda followed by creating a cv2.cuda_GpuMat(), and checking if the image is allocated on the GPU using isContinuous(), I get False as the output. This indicates that the image is not allocated on the GPU. I have verified that my GPU supports CUDA and have installed the necessary CUDA drivers.
Here's the output of running nvidia-smi in my terminal:
Sun Jul 16 09:46:38 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 536.40 Driver Version: 536.40 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 2060 WDDM | 00000000:2B:00.0 On | N/A |
| 0% 49C P8 10W / 170W | 785MiB / 6144MiB | 2% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
TensorFlow Lite XNNPACK Delegate: Additionally, I'm seeing the message "INFO: Created TensorFlow Lite XNNPACK delegate for CPU." in my console output. I want to ensure that the script is utilizing my GPU for acceleration instead of falling back to CPU execution.
Here is the folder structure of my project:
- README.md
- main.py
- models/
- gesture_recognizer.task
- hand_landmark_full.tflite
- hand_landmarker.task
- palm_detection_full.tflite
main.py script (Password:NDRmVaqqAd)
The main.py
script contains the hand tracking implementation. I have already set the use_gpu
flag to True
in the mp_hands.Hands
initialization.
I also ran the code snippet import cv2
followed by cv2.cuda.getCudaEnabledDeviceCount()
and it outputs 0
, indicating that CUDA is not available. Here's the code and the output:
import cv2
if cv2.cuda.getCudaEnabledDeviceCount() > 0:
print("CUDA is available!")
else:
print("CUDA is not available.")
Output
CUDA is not available.
I'm working inside a virtual environment, here's the list of packages and versions installed:
absl-py==1.4.0
asttokens==2.2.1
attrs==23.1.0
backcall==0.2.0
cffi==1.15.1
colorama==0.4.6
comm==0.1.3
contourpy==1.1.0
cycler==0.11.0
debugpy==1.6.7
decorator==5.1.1
EasyProcess==1.1
entrypoint2==1.1
executing==1.2.0
flatbuffers==23.5.26
fonttools==4.41.0
ipykernel==6.23.1
ipython==8.14.0
jedi==0.18.2
jupyter_client==8.2.0
jupyter_core==5.3.0
kiwisolver==1.4.4
matplotlib==3.7.2
matplotlib-inline==0.1.6
mediapipe==0.10.2
MouseInfo==0.1.3
mss==9.0.1
nest-asyncio==1.5.6
numpy==1.25.1
opencv-contrib-python==4.8.0.74
opencv-python==4.8.0.74
packaging==23.1
parso==0.8.3
pickleshare==0.7.5
Pillow==10.0.0
platformdirs==3.5.1
prompt-toolkit==3.0.38
protobuf==3.20.3
psutil==5.9.5
pure-eval==0.2.2
PyAutoGUI==0.9.54
pycparser==2.21
PyGetWindow==0.0.9
Pygments==2.15.1
PyMsgBox==1.0.9
pyparsing==3.0.9
pyperclip==1.8.2
PyRect==0.2.0
pyscreenshot==3.1
PyScreeze==0.1.29
python-dateutil==2.8.2
pytweening==1.0.7
pywin32==306
pyzmq==25.1.0
six==1.16.0
sounddevice==0.4.6
stack-data==0.6.2
tornado==6.3.2
traitlets==5.9.0
wcwidth==0.2.6
I would appreciate any guidance or suggestions on how to correctly configure and run the script on my GPU for improved performance. What steps can I take to ensure the GPU acceleration is properly utilized? Is there anything specific I need to do with TensorFlow Lite to enable GPU acceleration?
Thank you for your help!