A system and method for improving the accuracy of audio searching using multiple models to process an audio file or stream to obtain search tracks. The search tracks are processed to locate at least one search term and generate multiple search results. The number of search results is equivalent to the number of models used to process the audio stream. The search results are combined to generate a unified search result. The multiple models may represent different languages, dialects and accents.