Skip to main content

Model 3 - Gesture Embedder

Overview

Model 3: Gesture Embedding

With the landmarks identified, this model converts their spatial relationships into a mathematical representation suitable for classification.

  • Model: gesture_embedder.xml
  • Purpose: To create a compact numerical representation (a feature vector or "embedding") of the hand's current pose. This embedding captures the essential information about the gesture, independent of hand size or position.
  • Inputs:
    • hand: [1, 21, 3] - The normalized 21 landmarks from Stage 2.
    • handedness: [1, 1] - The handedness score from Stage 2.
    • world_hand: [1, 21, 3] - The world landmarks from Stage 2.
  • Output: [1, 128] - A 128-dimensional feature vector (the gesture embedding).
  • Key Functions & Logic:
    1. Inference (mediapipe_style_gesture_processing in hand_landmark.py): The three inputs (landmarks, handedness, world landmarks) are passed to the gesture_embedder model. The resulting 128-dimension embedding is stored in the HandRegion object.

Next Steps

With high-quality gesture embeddings generated, the pipeline proceeds to:

  • Model 4: Classify embeddings into predefined gestures
  • Custom Gestures: Train user-defined gesture recognizers
  • Similarity Matching: Compare against gesture databases
  • Real-time Classification: Instantaneous gesture recognition