Region Rectification and Cropping
Overview
Function 1 of the Hand Landmark Detection model handles the critical process of region rectification and cropping. This function transforms the rotated hand regions detected by Model 1 into standardized, properly oriented 224x224 images suitable for landmark detection processing.
Purpose and Functionality
This function performs essential preprocessing tasks:
- Image Warping: Applies affine transformation to straighten hand regions
- Region Cropping: Extracts hand regions from the original full-resolution frame
- Standardization: Creates consistent 224x224 input format for the landmark model
- Orientation Correction: Uses rotation information to properly align hands
Rotated Rectangle to Points Conversion
Point Calculation Implementation
def rotated_rect_to_points(cx, cy, w, h, rotation, wi, hi):
b = cos(rotation) * 0.5
a = sin(rotation) * 0.5
points = []
p0x = cx - a*h - b*w
p0y = cy + b*h - a*w
p1x = cx + a*h - b*w
p1y = cy - b*h - a*w
p2x = int(2*cx - p0x)
p2y = int(2*cy - p0y)
p3x = int(2*cx - p1x)
p3y = int(2*cy - p1y)
p0x, p0y, p1x, p1y = int(p0x), int(p0y), int(p1x), int(p1y)
return [(p0x,p0y), (p1x,p1y), (p2x,p2y), (p3x,p3y)]
Image Warping Process
Affine Transformation Implementation
def warp_rect_img(rect_points, img, w, h):
src = np.array(rect_points[1:], dtype=np.float32)
dst = np.array([(0, 0), (h, 0), (h, w)], dtype=np.float32)
mat = cv2.getAffineTransform(src, dst)
return cv2.warpAffine(img, mat, (w, h))
Rectangle Transformation System
Transformation Algorithm
The rectangle transformation adapts the detected hand regions for optimal landmark extraction:
def rect_transformation(regions, w, h):
scale_x = 1.4 # Increased from 2.0
scale_y = 2.4 # Increased from 2.4
shift_x = 0
shift_y = -0.4
for region in regions:
width = region.rect_w
height = region.rect_h
rotation = region.rotation # This will now always be 0.0
# The following lines are for rotation = 0
region.rect_x_center_a = (region.rect_x_center + width * shift_x) * w
region.rect_y_center_a = (region.rect_y_center + height * shift_y) * h
# The 'else' block for rotated cases is no longer needed as rotation is always 0
# else:
# x_shift = (w * width * shift_x * cos(rotation) - h * height * shift_y * sin(rotation))
# y_shift = (w * width * shift_x * sin(rotation) + h * height * shift_y * cos(rotation))
# region.rect_x_center_a = region.rect_x_center*w + x_shift
# region.rect_y_center_a = region.rect_y_center*h + y_shift
long_side = max(width * w, height * h)
region.rect_w_a = long_side * scale_x
region.rect_h_a = long_side * scale_y
region.rect_points = rotated_rect_to_points(region.rect_x_center_a, region.rect_y_center_a, region.rect_w_a, region.rect_h_a, region.rotation, w, h)
Key Parameters
- Scale Factors:
scale_x = 1.4,scale_y = 2.4for optimal hand region coverage - Shift Parameters:
shift_x = 0,shift_y = -0.4for proper hand centering - Output Size: Always 224x224 pixels for consistent model input
- Rotation Handling: Currently optimized for
rotation = 0.0cases
After region rectification and cropping, the pipeline proceeds to:
- Function 2: Landmark detection and smoothing on warped images
- Quality Validation: Ensure adequate input quality for landmark detection
- Coordinate Mapping: Prepare for landmark coordinate transformation back to original image space