An image obtained by capturing a gesture input region is acquired, and an object that makes a gesture is detected from the image. An intersection position at which the detected object crosses a determination region used to determine the position of the object with respect to the gesture input region is detected from the image. The base position of the gesture made by the object is computed based on the intersection position. The position of a target of manipulation by the gesture is determined as a manipulation target position. A gesture coordinate system different from the coordinate system of the image is determined based on the base position and the manipulation target position.