Non-Maximum Suppression and Rotation Calculation
Overview
The second key function of the Hand Detector model handles post-processing of raw detections through Non-Maximum Suppression (NMS) and calculates hand rotation for optimal landmark extraction. This function ensures clean, non-overlapping detections and prepares rotated bounding boxes for the next pipeline stage.
Purpose and Functionality
This function performs critical post-processing tasks:
- Non-Maximum Suppression: Eliminates redundant and overlapping detections
- Rotation Calculation: Determines hand orientation from palm keypoints
- Bounding Box Optimization: Creates rotation-invariant regions for landmark detection
- Quality Filtering: Ensures only high-quality detections proceed to the next stage
Bounding Box Decoding Implementation
Detection Decoding Algorithm
def decode_bboxes(score_thresh, scores, bboxes, anchors):
"""
wi, hi : NN input shape
mediapipe/calculators/tflite/tflite_tensors_to_detections_calculator.cc
# Decodes the detection tensors generated by the model, based on
# the SSD anchors and the specification in the options, into a vector of
# detections. Each detection describes a detected object.
scores: shape = [number of anchors 896]
bboxes: shape = [ number of anchors x 18], 18 = 4 (bounding box : (cx,cy,w,h) + 14 (7 palm keypoints)
"""
regions = []
scores = 1 / (1 + np.exp(-scores))
detection_mask = scores > score_thresh
det_scores = scores[detection_mask]
if det_scores.size == 0: return regions
det_bboxes = bboxes[detection_mask]
det_anchors = anchors[detection_mask]
scale = 128 # x_scale, y_scale, w_scale, h_scale
# cx, cy, w, h = bboxes[i,:4]
# cx = cx * anchor.w / wi + anchor.x_center
# cy = cy * anchor.h / hi + anchor.y_center
# lx = lx * anchor.w / wi + anchor.x_center
# ly = ly * anchor.h / hi + anchor.y_center
det_bboxes = det_bboxes* np.tile(det_anchors[:,2:4], 9) / scale + np.tile(det_anchors[:,0:2],9)
# w = w * anchor.w / wi (in the prvious line, we add anchor.x_center and anchor.y_center to w and h, we need to substract them now)
# h = h * anchor.h / hi
det_bboxes[:,2:4] = det_bboxes[:,2:4] - det_anchors[:,0:2]
# box = [cx - w*0.5, cy - h*0.5, w, h]
det_bboxes[:,0:2] = det_bboxes[:,0:2] - det_bboxes[:,3:4] * 0.5
for i in range(det_bboxes.shape[0]):
score = det_scores[i]
box = det_bboxes[i,0:4]
kps = []
# 0 : wrist
# 1 : index finger joint
# 2 : middle finger joint
# 3 : ring finger joint
# 4 : little finger joint
# 5 :
# 6 : thumb joint
for kp in range(7):
kps.append(det_bboxes[i,4+kp*2:6+kp*2])
regions.append(HandRegion(float(score), box, kps))
return regions
Non-Maximum Suppression (NMS)
NMS Implementation
def non_max_suppression(regions, nms_thresh):
boxes = [ [int(x*1000) for x in r.pd_box] for r in regions]
scores = [r.pd_score for r in regions]
indices = cv2.dnn.NMSBoxes(boxes, scores, 0, nms_thresh)
return [regions[i] for i in indices]
Hand Rotation Calculation
Rotation Computation
def normalize_radians(angle):
return angle - 2 * pi * floor((angle + pi) / (2 * pi))
def detections_to_rect(regions):
target_angle = pi * 0.5
for region in regions:
region.rect_w = region.pd_box[2]
region.rect_h = region.pd_box[3]
region.rect_x_center = region.pd_box[0] + region.rect_w / 2
region.rect_y_center = region.pd_box[1] + region.rect_h / 2
x0, y0 = region.pd_kps[0]
x1, y1 = region.pd_kps[2]
rotation = target_angle - atan2(-(y1 - y0), x1 - x0)
region.rotation = normalize_radians(rotation)
Rectangle Transformation
def rect_transformation(regions, w, h):
scale_x = 1.4 # Increased from 2.0
scale_y = 2.4 # Increased from 2.4
shift_x = 0
shift_y = -0.4
for region in regions:
width = region.rect_w
height = region.rect_h
rotation = region.rotation # This will now always be 0.0
# The following lines are for rotation = 0
region.rect_x_center_a = (region.rect_x_center + width * shift_x) * w
region.rect_y_center_a = (region.rect_y_center + height * shift_y) * h
long_side = max(width * w, height * h)
region.rect_w_a = long_side * scale_x
region.rect_h_a = long_side * scale_y
region.rect_points = rotated_rect_to_points(region.rect_x_center_a, region.rect_y_center_a, region.rect_w_a, region.rect_h_a, region.rotation, w, h)
Calculate rotation
try:
# Calculate rotation
result = detections_to_rect(detection)
return result
except Exception as e:
logging.warning(f"Rotation calculation failed: {e}")
# Fallback to axis-aligned rectangle
detection.rotation = 0.0
detection.rect_points = default_rectangle(detection.bbox)
return detection
Configuration Parameters
NMS Settings
nms_threshold: IoU threshold for suppression (default: 0.3)max_detections: Maximum number of detections to keep (default: 2)min_confidence: Minimum confidence for processing (default: 0.5)
Rotation Settings
scale_factor: Rectangle enlargement factor (default: 2.6)rotation_smoothing: Temporal rotation smoothing (default: 0.3)keypoint_confidence_threshold: Minimum keypoint confidence (default: 0.5)
Output Format
Processed HandRegion Structure
class ProcessedHandRegion:
def __init__(self, original_detection):
self.bbox = original_detection.bbox
self.confidence = original_detection.confidence
self.keypoints = original_detection.keypoints
self.rotation = None # Calculated rotation angle
self.rect_points = None # Rotated rectangle corners
self.quality_score = None # Overall quality metric
self.processing_time = None # Processing latency
Integration Points
Input Interface
- Raw Detections: Receives HandRegion objects from Function 1
- Configuration: NMS and rotation parameters
- Quality Metrics: Detection confidence and keypoint quality
Output Interface
- Clean Detections: Filtered, non-overlapping hand regions
- Rotation Data: Hand orientation information
- Next Stage: Feeds into Model 2 (Hand Landmarks)
Next Steps
After NMS and rotation calculation, the pipeline proceeds to:
- Model 2 (Hand Landmarks): Extract detailed 21-point landmarks using rotated regions
- Temporal Tracking: Maintain hand identity across frames
- Quality Assessment: Monitor detection stability and accuracy