1
John Walter McDonough Jr, Volker Sebastian Leutnant, Sri Venkata Surya Siva Rama Krishna Garimell, Spyridon Matsoukas: Determining speaker direction using a spherical microphone array. AMAZON TECHNOLOGIES, Seyfarth Shaw, Ilan N Barzilay, Tyrus S Cartwright, January 31, 2017: US09560441 (16 worldwide citation)

A system that detects audio including speech using a spherical sensor array estimates a direction of arrival of the speech using a Kalman filter. To improve the estimates of the Kalman filter, the system estimates a noise covariance matrix, representing noise detected by the array. The structure of ...


2
Rohit Prasad, Kenneth John Basye, Spyridon Matsoukas, Rajiv Ramachandran, Shiv Naga Prasad Vitaladevuni, Bjorn Hoffmeister: Keyword detection modeling using contextual and environmental information. Amazon Technologies, Knobbe Martens Olson & Bear, July 4, 2017: US09697828 (8 worldwide citation)

Features are disclosed for detecting words in audio using environmental information and/or contextual information in addition to acoustic features associated with the words to be detected. A detection model can be generated and used to determine whether a particular word, such as a keyword or “wake ...


3
Spyridon Matsoukas, Nikko Ström, Ariya Rastrow, Sri Venkata Surya Siva Rama Krishna Garimella: Generative modeling of speech using neural networks. Amazon Technologies, Knobbe Martens Olson & Bear, May 16, 2017: US09653093 (2 worldwide citation)

Features are disclosed for using an artificial neural network to generate customized speech recognition models during the speech recognition process. By dynamically generating the speech recognition models during the speech recognition process, the models can be customized based on the specific cont ...


4
Ariya Rastrow, Nikko Ström, Spyridon Matsoukas, Markus Dreyer, Ankur Gandhe, Denis Sergeyevich Filimonov, Julian Chan, Rohit Prasad: Speech processing with learned representation of user interaction history. Amazon Technologies, Knobbe Martens Olson & Bear, July 24, 2018: US10032463

An automatic speech recognition (“ASR”) system produces, for particular users, customized speech recognition results by using data regarding prior interactions of the users with the system. A portion of the ASR system (e.g., a neural-network-based language model) can be trained to produce an encoded ...


5
Shiva Kumar Sundaram, Chao Wang, Shiv Naga Prasad Vitaladevuni, Spyridon Matsoukas, Arindam Mandal: User presence detection. Amazon Technologies, Pierce Atwood, November 6, 2018: US10121494

A speech-capture device can capture audio data during wakeword monitoring and use the audio data to determine if a user is present nearby the device, even if no wakeword is spoken. Audio such as speech, human originating sounds (e.g., coughing, sneezing), or other human related noises (e.g., footste ...


6
Sri Venkata Surya Siva Rama Krishna Garimella, Spyridon Matsoukas, Ariya Rastrow, Bjorn Hoffmeister: Class-based discriminative training of speech models. Amazon Technologies, Knobbe Martens Olson & Bear, February 13, 2018: US09892726

Features are disclosed for modifying a statistical model to more accurately discriminate between classes of input data. A subspace of the total model parameter space can be learned such that individual points in the subspace, corresponding to the various classes, are discriminative with respect to t ...