Model 3 - Gesture Embedder

Overview

With the landmarks identified, this model converts their spatial relationships into a mathematical representation suitable for classification.

Model: gesture_embedder.xml
Purpose: To create a compact numerical representation (a feature vector or "embedding") of the hand's current pose. This embedding captures the essential information about the gesture, independent of hand size or position.
Inputs:
- hand: [1, 21, 3] - The normalized 21 landmarks from Stage 2.
- handedness: [1, 1] - The handedness score from Stage 2.
- world_hand: [1, 21, 3] - The world landmarks from Stage 2.
Output: [1, 128] - A 128-dimensional feature vector (the gesture embedding).
Key Functions & Logic:
1. Inference (mediapipe_style_gesture_processing in hand_landmark.py): The three inputs (landmarks, handedness, world landmarks) are passed to the gesture_embedder model. The resulting 128-dimension embedding is stored in the HandRegion object.

With high-quality gesture embeddings generated, the pipeline proceeds to: