1

I am working on a project which involves hand gesture recognition.I have to recognize the hand gesture and identify which letter of alphabet it represents.I am able to detect the skin using HSV color space.I have a video of all letters hand gestures and images of all letters hand gestures.Now I have to find which gesture represents which letter of alphabet.I need to know how to compare the gestures from the each frame of video with the image gestures.I am new to opencv ,please someone help me.This is my code

#include <opencv2\opencv.hpp>

using namespace cv;
using std::cout;

/*--------------- SKIN SEGMENTATION ---------------*/
   int main() {

  VideoCapture cap("E:\\videotest.mp4");

if (!cap.isOpened())
{// check if we succeeded
    printf("coundnotoepn");
    return -1;
}
Mat3b frame;
while (cap.read(frame)){

    /* THRESHOLD ON HSV*/
    cvtColor(frame, frame, CV_BGR2HSV);
    GaussianBlur(frame, frame, Size(7, 7), 1, 1);
    medianBlur(frame, frame, 15);
    for (int r = 0; r<frame.rows; ++r){
        for (int c = 0; c<frame.cols; ++c)
            // 0<H<0.25  -   0.15<S<0.9    -    0.2<V<0.95   
            if ((frame(r, c)[0]>5) && (frame(r, c)[0] < 17) && (frame(r, c)[1]>38) && (frame(r, c)[1]<250) && (frame(r, c)[2]>51) && (frame(r, c)[2]<242)); // do nothing
            else for (int i = 0; i<3; ++i)  frame(r, c)[i] = 0;
    }

    /* BGR CONVERSION AND THRESHOLD */
    Mat1b frame_gray;
    cvtColor(frame, frame, CV_HSV2BGR);
    cvtColor(frame, frame_gray, CV_BGR2GRAY);
    threshold(frame_gray, frame_gray, 60, 255, CV_THRESH_BINARY);
    morphologyEx(frame_gray, frame_gray, CV_MOP_ERODE, Mat1b(3, 3, 1), Point(-1, -1), 3);
    morphologyEx(frame_gray, frame_gray, CV_MOP_OPEN, Mat1b(7, 7, 1), Point(-1, -1), 1);
    morphologyEx(frame_gray, frame_gray, CV_MOP_CLOSE, Mat1b(9, 9, 1), Point(-1, -1), 1);

    medianBlur(frame_gray, frame_gray, 15);
//  imshow("Threshold", frame_gray);

    cvtColor(frame, frame, CV_BGR2HSV);
    resize(frame, frame, Size(), 0.5, 0.5);
    imshow("Video", frame);



    Mat3b image;
    image = imread("E:/hand.jpg", CV_LOAD_IMAGE_COLOR);   // Read the file

    if (!image.data)                              // Check for invalid input
    {
        cout << "Could not open or find the image" << std::endl;
        return -1;
    }
    cvtColor(image, image, CV_BGR2HSV);
    //printf("%d", image(2, 3)[5]);
    //resize(image,image, Size(), 0.5, 0.5);
    namedWindow("Display window", WINDOW_AUTOSIZE);// Create a window for display.
    imshow("Display window", image);                   // Show our image ins
    waitkey(1);
}

2 Answers2

0

There are several ways to approach this problem, the most widely used and obvious would be to track each individual finger (or out-pointing part of hand) and write certain rules to classify each gesture (i.e two fingers pointing outward of the top of the hand could be the "peace" symbol or something)

You can do this by tracking convex hull defects. Here is a link to a tutorial that will explain this process, its written in Python but i'm sure you can port it to C++ once you understand the logic.

However if you already have images of each gesture I would suggest the use of a neural network for classification, try and manipulate the images you already have so that they resemble the images you are trying to classify (i.e do skin detection and binarize the images)

Here is another link to a tutorial explaining what neural networks are, how they work and how to implement an image recognition networkk in C++.

I must mention that it's likely that each pixel will be used as an input to the network, so to take some strain off it (and make it train quicker) I would suggest resizing your images to make them as small as possible (but you can still make out the gesture)

Hope some of this info helps, Good luck!

Aphire
  • 1,621
  • 25
  • 55
  • You can also look into using the eigen pattern recognition method (usually reserved for face recognition, but in theory should work on any pattern) – Aphire Mar 10 '15 at 11:27
0

"I need to know how to compare the gestures from the each frame of video with the image gestures." - The key is figuring out what measure of similarity would work for your application.

There is no one-size-fits-all way of comparing images, and definitely not video sequences (much harder problem than images). A popular way to compare images is "Earth Mover's Distance" on color histograms; but this probably won't work in your case. You might try HoG recognizers trained to different gestures; or (for example) difference of DCT coefficients between images scaled down to really small size like 32x32. Template matching (OpenCV matchTemplate) likely won't work here because you want to compare an image to a category (all possible images of the same kind of thing) and template doesn't do that. Template matching with k-nearest neighbor classification and a large library of examples (a few thousand per category) might work.

To recognize hand gestures (with movement) as opposed to hand shapes (not moving), your best bet is to read the literature and implement a published algorithm. Try a Google Scholar search for "hand gesture recognition video". For example:

Lastly, this is going to be pretty hard; don't expect anything in OpenCV to do the job in a simple way. There is HoG in OpenCV, but you'll have to train it, and tweak it extensively. Other published algorithms (like 3D wavelets) you'll have to build from scratch and/or add another library to OpenCV. Good luck :)

Alex I
  • 19,689
  • 9
  • 86
  • 158