1
Stephane Herman Maes, Chalapathy Venkata Neti: System and method for multi-modal focus detection, referential ambiguity resolution and mood classification using multi-modal input. International Business Machines Corporation, November 8, 2005: US06964023 (339 worldwide citation)

Systems and methods are provided for performing focus detection, referential ambiguity resolution and mood classification in accordance with multi-modal input data, in varying operating conditions, in order to provide an effective conversational computing environment for one or more users.


2
Sankar Basu, Philippe Christian de Cuetos, Stephane Herman Maes, Chalapathy Venkata Neti, Andrew William Senior: Methods and apparatus for audio-visual speech detection and recognition. International Business Machines Corporation, Thu Ann Dang, Ryan Mason & Lewis, July 15, 2003: US06594629 (76 worldwide citation)

In a first aspect of the invention, methods and apparatus for providing speech recognition comprise the steps of processing a video signal associated with an arbitrary content video source, processing an audio signal associated with the video signal, and decoding the processed audio signal in conjun ...


3
Sankar Basu, Homayoon S M Beigi, Stephane Herman Maes, Benoit Emmanuel Ghislain Maison, Chalapathy Venkata Neti, Andrew William Senior: Methods and apparatus for audio-visual speaker recognition and utterance verification. International Business Machines Corporation, Paul J Otterstedt, Ryan Mason & Lewis, April 17, 2001: US06219640 (64 worldwide citation)

Methods and apparatus for performing speaker recognition comprise processing a video signal associated with an arbitrary content video source and processing an audio signal associated with the video signal. Then, an identification and/or verification decision is made based on the processed audio sig ...


4
Sankar Basu, Philippe Christian de Cuetos, Stephane Herman Maes, Chalapathy Venkata Neti, Andrew William Senior: Method and apparatus for audio-visual speech detection and recognition. International Business Machines Corporation, Thu Ann Dang, Ryan Mason & Lewis, November 9, 2004: US06816836 (63 worldwide citation)

Techniques for providing speech recognition comprise the steps of processing a video signal associated with an arbitrary content video source, processing an audio signal associated with the video signal, and recognizing at least a portion of the processed audio signal, using at least a portion of th ...


5
Chalapathy Venkata Neti, Salim Estephan Roukos: Speech recognition models combining gender-dependent and gender-independent phone states and using phonetic-context-dependence. International Business Machines Corporation, Robert P Tassinari Jr, September 14, 1999: US05953701 (58 worldwide citation)

A method of gender dependent speech recognition includes the steps of identifying phone state models common to both genders, identifying gender specific phone state models, identifying a gender of a speaker and recognizing acoustic data from the speaker. A method of constructing a gender-dependent s ...


6
Chalapathy Venkata Neti, Nitendra Rajput, L Venkata Subramaniam, Ashish Verma: Language context dependent data labeling. International Business Machines Corporation, Pete Tennent, Ryan Mason & Lewis, November 13, 2007: US07295979 (8 worldwide citation)

Bootstrapping of a system from one language to another often works well when the two languages share the similar acoustic space. However, when the new language has sounds that do not occur in the language from which the bootstrapping is to be done, bootstrapping does not produce good initial models ...


7
Jonathan H Connell, Norman Haas, Etienne Marcheret, Chalapathy Venkata Neti, Gerasimos Potamianos: Audio-only backoff in audio-visual speech recognition system. International Business Machines Corporation, Ryan Mason & Lewis, July 31, 2007: US07251603 (6 worldwide citation)

Techniques for performing audio-visual speech recognition, with improved recognition performance, in a degraded visual environment. For example, in one aspect of the invention, a technique for use in accordance with an audio-visual speech recognition system for improving a recognition performance th ...


8
Jonathan H Connell, Norman Haas, Etienne Marcheret, Chalapathy Venkata Neti, Gerasimos Potamianos: Audio-only backoff in audio-visual speech recognition system. International Business Machines Corporation, Ryan Mason & Lewis, December 23, 2004: US20040260554-A1

Techniques for performing audio-visual speech recognition, with improved recognition performance, in a degraded visual environment. For example, in one aspect of the invention, a technique for use in accordance with an audio-visual speech recognition system for improving a recognition performance th ...


9
Giridharan R Iyengar, Chalapathy Venkata Neti, Harriet Jane Nock: Method, apparatus, and program for cross-linking information sources using multiple modalities. International Business Machines Corporation, Duke W Yee, February 17, 2005: US20050038814-A1

A mechanism is provided for cross-linking information sources using multiple modalities. Text documents, images, audio sources, video, and other media are analyzed to determine media descriptors, which are metadata describing the content of the media sources. The media descriptors from all modalitie ...


10
Hugh William Adams, Giridharan Iyengar, Ching Yung Lin, Milind R Naphade, Chalapathy Venkata Neti, Harriet Jane Nock, John Richard Smith, Belle L Tseng: Apparatus and methods for semantic representation and retrieval of multimedia content. International Business Machines Corporation, Duke W Yee, June 10, 2004: US20040111432-A1

An apparatus and method for analyzing multimedia content to identify the presence of audio, visual and textual cues that together correspond to one or more high-level semantics are provided. The apparatus and method make use of one or more analysis models that are trained to analyze audio, visual an ...