The system, method, and program of the invention captures actual physical gestures made by a participant during a chat room or instant messaging session or other real time communication session between participants over a network and automatically transmits a representation of the gestures to the other participants. Image processing software analyzes successive video images, received as input from a video camera, for an actual physical gesture made by a participant. When a physical gesture is analyzed as being made, the state of the gesture is also determined. The state of the gesture identifies whether it is a first occurrence of the gesture or a subsequent occurrence. An action, and a parameter for the action, is determined for the gesture and the particular state of the gesture. A command to the API of the communication software, such as chat room software, is automatically generated which transmits a representation of the gesture to the participants through the communication software.