An interactive, computer-based system for combining on a common display, as overlaid images, video signals from a source of video images (such as a video disc) with graphics and text from a computer, particularly for use in computer-aided instruction, computer-based information retrieval and visual annotation or supplementation of video images. The video signals are converted to RGB format (or some other non-phase-modulated format) if not already so encoded. They are then supplied to a switch which receives the text and graphic-signals, also in RGB format, as another input. The output of the switch is supplied to a video display. The switch is controlled by an attribute of the text and graphics signals, for controlling the selection of the video source on a pixel-by-pixel basis. A keyboard or other device connected to the computer allows the user to provide input and, responsive thereto, the computer directs the operation of the video source, such as changing frames on a video disc, providing image enhancement or masking, etc.