3

I'm trying to make a sign language detection application. I'm using MediaPipe Holistic to extract key points and will use LSTM to train the model.

MediaPipe Holistic generates a total of 543 landmarks (33 pose landmarks, 468 face landmarks, and 21 hand landmarks per hand) for each sign language gesture.

Now, my question is, how I can connect the 543 landmarks to the gesture?. Is there a way that I can the computer that the keypoints that it is extracting belong to a certain gesture?

BZKN
  • 1,499
  • 2
  • 10
  • 25

1 Answers1

1

The answer to your question can be found in Gabriel Guerin's excellent article and accompanying code samples. The code sample only looks at the hand landmarks. I'd pretty much have to paste the whole article to answer the question completely but I'll give a high level overview. Convert the landmarks into feature vectors. Build a model consisting of several frames with each frame containing the vectors of the hand. Use Dynamic Time Warping (DTW) to compare a given sign with a small set of knows signs. Use a threshold of similarity to the samples to offer a prediction of the sign. Using this technique works will if there is only a small number of samples trained to recognize. It would break down if a full sign language vocabulary was used. Deep learning with a classifier would be a better technique for a large vocabulary. Even this would probably break down because a real sign language is not a collection of signs with a one to one correspondence with spoken word. Sign languages have complex structures that can have different word orders and prepositions that are expressed only be the direction the signer is facing. I would be very interested in an projects that can recognize more that a few signs. I believe the holistic model will make it possible but will require a large corpus of samples and a way to interpret the complex grammars.

David Stone
  • 588
  • 1
  • 5
  • 16