A video entertainment system by which human viewers conduct simulated voice conversations with screen actors or cartoon characters in a prerecorded branching movie shown on a television screen. The actors and cartoons reply responsively with lip-sync sound to words spoken by viewers. Different audio and video frames are addressed on a videodisc to provide one of several alternative replies or alternative actions at each branch point in the movie, depending on what the viewer says to a speech-recognition unit. A simple speech-recognition unit can be used because the number of words to be recognized at each branch point is restricted to just a few words. A menu of prompting words is displayed on a hand-held unit to inform viewers of what words they can use at each branch point. The prompting words are programmed to be phonetically distinct to be easily distinguishable from each other. Viewers can input questions or make other remarks by speaking a displayed code word which stands for a whole sentence. Pressing a button next to a sentence displayed on the hand-held unit may cause a recording of the displayed sentence to be played in lieu of a viewer speaking it. Viewers can chat with simulated images of famous people, call the plays in a ball game, make executive decisions as a king or general, and participate in simulated adventures with interesting characters who respond to each viewer's words and answer back responsively.