A method and system provides a face-to-face video conference utilizing a video mirror. The method and apparatus comprise a first station having a first predetermined sensory setting; a second station having a second predetermined sensory setting; and an imaging system for capturing an image or sub-image at the first station, displaying at least a portion of the image or sub-image at the second station such that it becomes generally visually integrated with the second predetermined sensory setting. Also, disclosed is apparatus and method for effecting a face-to-face presence environment regardless of whether the first and second predetermined sensory settings are the same or different. The stations may be portable and/or modular such that they can be easily constructed or assembled. The stations may also be architectured and/or decorated to further enhance the face-to-face environment created by the video conferencing system and method.