A speech sample is recognized with a computer system by processing the speech sample with at least two speech recognizers, each of which has a different performance characteristic. One speech recognizer may be a large-vocabulary, continuous speech recognizer optimized for real-time responsiveness and another speech recognizer may be an offline recognizer optimized for high accuracy. The speech content of the sample is recognized based on processing results from the speech recognizers. The speaker is provided with a real-time, yet potentially error-laden, text display corresponding to the speech sample while, subsequently, a human transcriptionist may use recognition results from multiple recognizers to produce an essentially error-free transcription. The performance characteristics of the recognizers may be based on style or subject matter, and the recognizers may operate serially or in parallel. Sets of candidates produced by the two recognizers may be combined by merging the scores to generate a combined set of candidates that corresponds to the union of the two sets. Offline processing may be performed based on input from a human operator, cost, processing times, confidence levels, or importance. Uncertainty for a candidate may occur when a difference between a score for a best scoring candidate and a score for a second best scoring candidate is less than a threshold value. A graphic user interface may allow the user to selectively transmit the speech sample to an other speech recognizer (or restrict such transmission), based on document type or availability of the second speech recognizer.