Skip to main content

Dynamic Finger Gesture Recognition

  • Inspiration: Rather than relying solely on pre-defined static gestures, I designed a system that detects finger bends, inspired by how touchpads interpret input.
  • Landmark Angles: The pipeline accurately identifies all 21 hand landmark points. To detect finger bends, I calculate the angle between landmark 5 (base of the index finger) and landmark 8 (tip of the index finger), using landmark 6 (first joint) as the reference point. If the angle at this joint falls between 60° and 130°, the finger is considered bent.

  • Multi-Finger Gestures: Using this approach, I implemented recognition for two-finger bent gestures (e.g., index + middle finger), allowing for more complex and expressive controls.
  • Relaxed State: To avoid gesture overlap and false positives, I defined a “relaxed” state for fingers, ensuring the system distinguishes between intentional and unintentional gestures.

This system supports:

  • Single finger bend (e.g., index bend)
  • Double finger bend (e.g., index + middle bend)

Helper functions (example implementation)

Below are short, copy-pasteable Python helpers you can place in gesture_processor.py (or inline in your pipeline) to detect finger bends and deduce single- or multi-finger dynamic gestures. They assume landmarks are lists/arrays of 21 (x,y[,z]) coordinates normalized to the same reference used elsewhere in the pipeline.

import numpy as np

def calculate_angle(a, b, c):
"""Return the angle (in degrees) at point b formed by points a-b-c.

a, b, c: iterable 2D coordinates (x,y) or arrays.
"""
a = np.array(a[:2], dtype=np.float32)
b = np.array(b[:2], dtype=np.float32)
c = np.array(c[:2], dtype=np.float32)

ba = a - b
bc = c - b
denom = (np.linalg.norm(ba) * np.linalg.norm(bc)) + 1e-8
cos_angle = np.dot(ba, bc) / denom
cos_angle = np.clip(cos_angle, -1.0, 1.0)
angle_rad = np.arccos(cos_angle)
return float(np.degrees(angle_rad))


def detect_index_finger_bend(landmarks, lower=60.0, upper=130.0):
"""Detect whether the index finger is bent using landmarks indices 5-6-8.

Returns (is_bent: bool, angle: float).
Landmark indices follow MediaPipe convention: 5=MP (base), 6=PIP (joint), 8=tip.
We compute the angle at the PIP joint (landmark 6) using vectors (5->6) and (8->6).
"""
try:
a = landmarks[5]
b = landmarks[6]
c = landmarks[8]
except Exception:
return False, 0.0

angle = calculate_angle(a, b, c)
is_bent = (angle >= lower) and (angle <= upper)
return is_bent, angle


def detect_middle_finger_bend(landmarks, lower=60.0, upper=130.0):
"""Same as index but for middle finger (landmarks 9-10-12)."""
try:
a = landmarks[9]
b = landmarks[10]
c = landmarks[12]
except Exception:
return False, 0.0
angle = calculate_angle(a, b, c)
is_bent = (angle >= lower) and (angle <= upper)
return is_bent, angle


def detect_gesture_type(region, landmarks, bend_thresholds=(60.0, 130.0)):
"""Determine dynamic gesture type and store it in `region.dynamic_gesture`.

The function checks index and middle finger bends and sets:
- 'index_only' if only index is bent
- 'index_middle' if both index and middle are bent
- 'relaxed' otherwise

It also stores numeric angles and individual finger states on `region` for debugging.
"""
idx_bent, idx_angle = detect_index_finger_bend(landmarks, *bend_thresholds)
mid_bent, mid_angle = detect_middle_finger_bend(landmarks, *bend_thresholds)

region.finger_states = {
'index': {'bent': idx_bent, 'angle': idx_angle},
'middle': {'bent': mid_bent, 'angle': mid_angle}
}

if idx_bent and not mid_bent:
region.dynamic_gesture = 'index_only'
elif idx_bent and mid_bent:
region.dynamic_gesture = 'index_middle'
else:
region.dynamic_gesture = 'relaxed'

return region.dynamic_gesture

Usage notes

  • Put these helpers in gesture_processor.py and call detect_gesture_type(region, region.landmarks) after landmark postprocessing.
  • Tune lower/upper angle thresholds per device / camera angle if needed — the provided defaults (60°–130°) follow your earlier specification.
  • The region object will receive dynamic_gesture and finger_states attributes for downstream handling and debugging.