The method captures a 3D model of a face, which includes a 3D mesh and a series of deformations of the mesh that define changes in position of the mesh over time (e.g., for each frame). The method also builds a texture map associated with each frame in an animation sequence. The method achieves significant advantages by using markers on an actor's face to track motion of the face over time and to establish a relationship between the 3D model and texture. Specifically, videos of an actor's face with markers are captured from multiple cameras. Stereo matching is used to derive 3D locations of the markers in each frame. A 3D scan is also performed on the actor's face with the markers to produce an initial mesh with markers. The markers from the 3D scan are matched with the 3D locations of the markers in each frame from the stereo matching process. The method determines how the position of the mesh changes from frame to frame by matching the 3D locations of the markers from one frame to the next. The method derives textures for each frame by removing the dots from the video data, finding a mapping between texture space and the 3D space of the mesh, and combining the camera views for each frame into a signal texture map. The data needed to represent facial animation includes: 1) an initial 3D mesh, 2) 3D deformations of the mesh per frame, and 3) a texture map associated with each deformation. The method compresses 3D geometry by decomposing the deformation data into basis vectors and coefficients. The method compresses the textures using video compression.