Reference / Nodes

Vision

Computer-vision nodes that analyze the image and hand you back data. They run Apple’s Vision and CoreML frameworks on a background queue, pass the source through untouched, and expose results both as overlay textures and as float ports you can route into any parameter.

Detection is throttled and smoothed (exponential moving average) so the tracked values glide at full render rate without jitter. Tuning fields like confidence and detection rate live in the inspector; the ports below are the connectable inputs and outputs.

Face Detection

faceDetectionVision

Detects faces in the image and emits a box overlay, an alpha mask, and per-face position/size data.

Inputs
PortTypeDefaultRangeNotes
texturetexture,,Image to analyze.
Outputs
PortTypeDefaultRangeNotes
texturetexture,,Source passthrough (zero-copy).
overlaytexture,,Transparent canvas with bounding boxes.
masktexture,,White inside face regions, transparent outside.
faceCountfloat,,Number of detected faces.
face1Xfloat,01Center X of the largest face (0.5 if none).
face1Yfloat,01Center Y of the largest face.
face1Sizefloat,01Normalized long-edge size.

Runs VNDetectFaceRectangles (~5–15 ms/frame on Apple Silicon, no model file needed). Composite overlay back over the video, use mask to isolate or blur faces, or drive effects from the largest face’s position. Box and mask styling, confidence and max-face count are inspector fields.

Pose Tracking

poseTrackingVision

Tracks a human skeleton and emits a passthrough, a skeleton overlay, a detected flag, and X/Y ports for 13 joints.

Inputs
PortTypeDefaultRangeNotes
texturetexture,,Image to analyze.
Outputs
PortTypeDefaultRangeNotes
texturetexture,,Source passthrough.
overlaytexture,,Transparent canvas with bones + joints.
detectedfloat,,1 if a body was found, else 0.
headX / headYfloat,01Nose joint, normalized, top-left origin.
leftShoulderX / Yfloat,01Left shoulder.
rightShoulderX / Yfloat,01Right shoulder.
leftElbowX / Yfloat,01Left elbow.
rightElbowX / Yfloat,01Right elbow.
leftWristX / Yfloat,01Left wrist.
rightWristX / Yfloat,01Right wrist.
leftHipX / Yfloat,01Left hip.
rightHipX / Yfloat,01Right hip.
leftKneeX / Yfloat,01Left knee.
rightKneeX / Yfloat,01Right knee.
leftAnkleX / Yfloat,01Left ankle.
rightAnkleX / Yfloat,01Right ankle.

Runs VNDetectHumanBodyPose on the first detected body. Each of the 13 joints is exposed as a separate ...X and ...Y float port (the table pairs them for brevity) - normalized 0–1 with a top-left origin and EMA-smoothed. Drive instancers, particles or distortions from a wrist or the head, or composite the skeleton overlay for a motion-capture look.

Optical Flow

opticalFlowVision

Computes per-pixel motion between consecutive frames, encoded as a texture (RG = motion XY, B = magnitude).

Inputs
PortTypeDefaultRangeNotes
texturetexture,,Current frame; compared against the previous one.
sensitivityfloat10,Motion amplification.
Outputs
PortTypeDefaultRangeNotes
texturetexture,,R/G = motion X/Y (0.5 = none), B = magnitude.

Estimates flow with a Horn-Schunck-style gradient method against the previous frame. The encoded vectors are ready to drive displacement, speed masks or motion-reactive effects - feed it into a Noise Displace or read its blue channel as a motion amount. Pre-blur and a motion threshold are inspector fields.

Depth Estimation

depthEstimationVision

Produces a grayscale monocular depth map from any 2D image, with range remap, invert, smoothing and depth-slab isolation.

Inputs
PortTypeDefaultRangeNotes
texturetexture,,Image to estimate depth from.
nearfloat001Lower depth bound; maps to white.
farfloat101Upper depth bound; maps to black.
smoothingfloat0.401Temporal EMA; 1 = frozen frame.
invertboolfalse,Swap near/far so close = black.
isolateEnabledboolfalse,Enable depth-slab isolation.
isolateTargetfloat0.501Center depth of the band.
isolateWidthfloat0.201Width of the band.
isolateFallofffloat0.0801Smooth falloff outside the band.
Outputs
PortTypeDefaultRangeNotes
depthtexture,,Grayscale depth map (16-bit float).

Runs the DepthAnything V2 CoreML model to infer depth from a single image - no depth camera required. A Metal post-pass remaps the range, inverts, smooths over time, and can isolate a depth slab. Pair it with Depth Displacement or 3D Effects to push foreground and background apart.