A system and method for producing highly realistic video images which depict the appearance of a simulated structure in an actual environment, and provides for accurate placement of the structure in the environment and matching of the perspective of the structure with that of the environment so that a highly realistic result is achieved. The system includes a video input means, such as a video camera and video recorder, by which a video image of the actual environment may be captured. A graphics processor unit receives the video image from the video input means and stores it in rasterized form. Field data input means is provided for receiving field location data regarding the precise location of the captured video image and field perspective data regarding the perspective of the captured image. Object data input means is also provided for receiving data, such as CAD data for example, for a three-dimensional model of a simulated object which is proposed to be included in the environment. From the three-dimensional model of the object, a two-dimensional perspective representation of the object is generated, which accurately matches the perspective of the captured video image. The thus generated two-dimensional perspective representation is then merged with the rasterized video image and accurately positioned at its proper location in the environment.