Methods and systems for time-frequency domain watermarking of media signals, such as audio and video signals. An encoding method divides the media signal into segments, transforms each segment into a time-frequency representation, and computes a time-frequency domain watermark signal based on the time frequency representation. It then combines the time-frequency domain watermark signal with the media signal to produce a watermarked media signal. To embed a message using this method, one may use peak modulation, pseudorandom noise modulation, statistical feature modulation, etc. Watermarking in the time-frequency domain enables the encoder to perceptually model time and frequency attributes of the media signal simultaneously. A watermark decoder uses a calibration signal to detect the watermark signal in a potentially distorted version of the watermarked signal. The calibration signal may also be used to determine the watermark's alignment and scaling. After compensating for the alignment and scaling, a watermark reader extracts an embedded message from a time frequency representation of the media signal.