A live video insertion system (LVIS) is disclosed that allows insertion of static or dynamic images into a live video broadcast in a realistic fashion on a real time basis. Initially, natural landmarks in a scene that are suitable for subsequent detection and tracking are selected. The landmarks are typically distributed throughout the entire scene, such as a ballpark or football stadium. The field of view of the camera at any instant is normally significantly smaller than the full scene that may be panned. The LVIS uses a combination of pattern recognition techniques and camera sensor data (e.g., pan, tilt, zoom, etc.) to locate, verify and track target data. Camera sensors are well suited for the searching requirements of an LVIS, while pattern recognition and landmark tracking techniques are better suited for the image tracking requirements of LVIS.