Model 3 - Gesture Embedder
Overview
Model 3: Gesture Embedding
With the landmarks identified, this model converts their spatial relationships into a mathematical representation suitable for classification.
- Model:
gesture_embedder.xml - Purpose: To create a compact numerical representation (a feature vector or "embedding") of the hand's current pose. This embedding captures the essential information about the gesture, independent of hand size or position.
- Inputs:
- hand:
[1, 21, 3]- The normalized 21 landmarks from Stage 2. - handedness:
[1, 1]- The handedness score from Stage 2. world_hand:[1, 21, 3]- The world landmarks from Stage 2.
- hand:
- Output:
[1, 128]- A 128-dimensional feature vector (the gesture embedding). - Key Functions & Logic:
- Inference (
mediapipe_style_gesture_processinginhand_landmark.py): The three inputs (landmarks, handedness, world landmarks) are passed to thegesture_embeddermodel. The resulting 128-dimension embedding is stored in theHandRegionobject.
- Inference (
Next Steps
With high-quality gesture embeddings generated, the pipeline proceeds to:
- Model 4: Classify embeddings into predefined gestures
- Custom Gestures: Train user-defined gesture recognizers
- Similarity Matching: Compare against gesture databases
- Real-time Classification: Instantaneous gesture recognition