A region of interest is selected by the user by either moving object or the camera to place the interesting region in the center of the image. Given a rudimentary guess of a foreground-background segmentation using a circular lump about the center of the screen, the algorithm begins to repeatedly build a model of color likelihood given a segmentation label (a value between 0 and 255) then relabel each pixel with its most likely label. At the end of each pass the label image is smoothed with a small Gaussian kernel. Passes are synchronized with grabbing of new frames from the camera so, in this way, the label image from the previous frame becomes the prior labels for the next frame, exploiting temporal coherence.
The combined sharing of information across space and time allows the algorithm to track moving regions of interest even under drastic appearance changes. This comes with a trade-off for the region of interest shifting undesirably in some occasions. Though it is uncommon, it is quite possible for the region of interest to become disconnected. In the right image, several distinct blobs are visible on the door.
To create visual emphasis, the areas outside of the region of interest are darkened and blurred slightly.
Source and binary (128k, requires quicktime for camera access): http://adamsmith.as/typ0/sketch_070