Described is a behavior recognition system for detecting the behavior of objects in a scene. The system comprises a semantic object stream module for receiving a video stream having at least two frames and detecting objects in the video stream. Also included is a group organization module for utilizing the detected objects from the video stream to detect a behavior of the detected objects. The group organization module further comprises an object group stream module for spatially organizing the detected objects to have relative spatial relationships. The group organization module also comprises a group action stream module for modeling a temporal structure of the detected objects. The temporal structure is an action of the detected objects between the two frames, whereby through detecting, organizing and modeling actions of objects, a user can detect the behavior of the objects.