Vision
Computer-vision nodes that analyze the image and hand you back data. They run Apple’s Vision and CoreML frameworks on a background queue, pass the source through untouched, and expose results both as overlay textures and as float ports you can route into any parameter.
Detection is throttled and smoothed (exponential moving average) so the tracked values glide at full render rate without jitter. Tuning fields like confidence and detection rate live in the inspector; the ports below are the connectable inputs and outputs.
Face Detection
faceDetectionVisionDetects faces in the image and emits a box overlay, an alpha mask, and per-face position/size data.
| Port | Type | Default | Range | Notes |
|---|---|---|---|---|
| texture | texture | , | , | Image to analyze. |
| Port | Type | Default | Range | Notes |
|---|---|---|---|---|
| texture | texture | , | , | Source passthrough (zero-copy). |
| overlay | texture | , | , | Transparent canvas with bounding boxes. |
| mask | texture | , | , | White inside face regions, transparent outside. |
| faceCount | float | , | , | Number of detected faces. |
| face1X | float | , | 0 – 1 | Center X of the largest face (0.5 if none). |
| face1Y | float | , | 0 – 1 | Center Y of the largest face. |
| face1Size | float | , | 0 – 1 | Normalized long-edge size. |
Runs VNDetectFaceRectangles (~5–15 ms/frame on Apple Silicon, no model file needed). Composite overlay back over the video, use mask to isolate or blur faces, or drive effects from the largest face’s position. Box and mask styling, confidence and max-face count are inspector fields.
Pose Tracking
poseTrackingVisionTracks a human skeleton and emits a passthrough, a skeleton overlay, a detected flag, and X/Y ports for 13 joints.
| Port | Type | Default | Range | Notes |
|---|---|---|---|---|
| texture | texture | , | , | Image to analyze. |
| Port | Type | Default | Range | Notes |
|---|---|---|---|---|
| texture | texture | , | , | Source passthrough. |
| overlay | texture | , | , | Transparent canvas with bones + joints. |
| detected | float | , | , | 1 if a body was found, else 0. |
| headX / headY | float | , | 0 – 1 | Nose joint, normalized, top-left origin. |
| leftShoulderX / Y | float | , | 0 – 1 | Left shoulder. |
| rightShoulderX / Y | float | , | 0 – 1 | Right shoulder. |
| leftElbowX / Y | float | , | 0 – 1 | Left elbow. |
| rightElbowX / Y | float | , | 0 – 1 | Right elbow. |
| leftWristX / Y | float | , | 0 – 1 | Left wrist. |
| rightWristX / Y | float | , | 0 – 1 | Right wrist. |
| leftHipX / Y | float | , | 0 – 1 | Left hip. |
| rightHipX / Y | float | , | 0 – 1 | Right hip. |
| leftKneeX / Y | float | , | 0 – 1 | Left knee. |
| rightKneeX / Y | float | , | 0 – 1 | Right knee. |
| leftAnkleX / Y | float | , | 0 – 1 | Left ankle. |
| rightAnkleX / Y | float | , | 0 – 1 | Right ankle. |
Runs VNDetectHumanBodyPose on the first detected body. Each of the 13 joints is exposed as a separate ...X and ...Y float port (the table pairs them for brevity) - normalized 0–1 with a top-left origin and EMA-smoothed. Drive instancers, particles or distortions from a wrist or the head, or composite the skeleton overlay for a motion-capture look.
Optical Flow
opticalFlowVisionComputes per-pixel motion between consecutive frames, encoded as a texture (RG = motion XY, B = magnitude).
| Port | Type | Default | Range | Notes |
|---|---|---|---|---|
| texture | texture | , | , | Current frame; compared against the previous one. |
| sensitivity | float | 10 | , | Motion amplification. |
| Port | Type | Default | Range | Notes |
|---|---|---|---|---|
| texture | texture | , | , | R/G = motion X/Y (0.5 = none), B = magnitude. |
Estimates flow with a Horn-Schunck-style gradient method against the previous frame. The encoded vectors are ready to drive displacement, speed masks or motion-reactive effects - feed it into a Noise Displace or read its blue channel as a motion amount. Pre-blur and a motion threshold are inspector fields.
Depth Estimation
depthEstimationVisionProduces a grayscale monocular depth map from any 2D image, with range remap, invert, smoothing and depth-slab isolation.
| Port | Type | Default | Range | Notes |
|---|---|---|---|---|
| texture | texture | , | , | Image to estimate depth from. |
| near | float | 0 | 0 – 1 | Lower depth bound; maps to white. |
| far | float | 1 | 0 – 1 | Upper depth bound; maps to black. |
| smoothing | float | 0.4 | 0 – 1 | Temporal EMA; 1 = frozen frame. |
| invert | bool | false | , | Swap near/far so close = black. |
| isolateEnabled | bool | false | , | Enable depth-slab isolation. |
| isolateTarget | float | 0.5 | 0 – 1 | Center depth of the band. |
| isolateWidth | float | 0.2 | 0 – 1 | Width of the band. |
| isolateFalloff | float | 0.08 | 0 – 1 | Smooth falloff outside the band. |
| Port | Type | Default | Range | Notes |
|---|---|---|---|---|
| depth | texture | , | , | Grayscale depth map (16-bit float). |
Runs the DepthAnything V2 CoreML model to infer depth from a single image - no depth camera required. A Metal post-pass remaps the range, inverts, smooths over time, and can isolate a depth slab. Pair it with Depth Displacement or 3D Effects to push foreground and background apart.