A sound produced at the location of a listener is captured by a microphone in each of a plurality of speaker devices. A sever apparatus receives an audio signal of the captured sound from all speaker devices, and calculates a distance difference between the distance of the location of the listener to the speaker device closest to the listener and the distance of the listener to each of the plurality of speaker devices. When one of the speaker devices emits a sound, the server apparatus receives an audio signal of the sound captured by and transmitted from each of the other speaker devices. The server apparatus calculates a speaker-to-speaker distance between the speaker device that has emitted the sound and each of the other speaker devices. The server apparatus calculates a layout configuration of the plurality of speaker devices based on the distance difference and the speaker-to-speaker distance.