Motion picture scenes to be colorized/depth enhanced (2D→3D) are broken into separate elements, backgrounds/sets or motion/onscreen-action. Background and motion elements are combined separately into single frame representations of multiple frames which becomes a visual reference database that includes data for all frame offsets used later for the computer controlled application of masks within a sequence of frames. Each pixel address within the database corresponds to a mask/lookup table address within the digital frame and X, Y, Z location of subsequent frames. Masks are applied to subsequent frames of motion objects based on various differentiating image processing methods, including automated mask fitting of all masks or single masks in an entire frame, bezier and polygon tracing of selected regions with edge detected shaping and operator directed detection of subsequent regions. Colors and/or depths are automatically applied to masks throughout a scene from the composite background and to motion objects.