05873056 is referenced by 354 patents and cites 32 patents.

A natural language processing system uses unformatted naturally occurring text and generates a subject vector representation of the text, which may be an entire document or a part thereof such as its title, a paragraph, clause, or a sentence therein. The subject codes which are used are obtained from a lexical database and the subject code(s) for each word in the text is looked up and assigned from the database. The database may be a dictionary or other word resource which has a semantic classification scheme as designators of subject domains. Various meanings or senses of a word may have assigned thereto multiple, different subject codes and psycholinguistically justified sense meaning disambiguation is used to select the most appropriate subject field code. Preferably, an ordered set of sentence level heuristics is used which is based on the statistical probability or likelihood of one of the plurality of codes being the most appropriate one of the plurality. The subject codes produce a weighted, fixed-length vector (regardless of the length of the document) which represents the semantic content thereof and may be used for various purposes such as information retrieval, categorization of texts, machine translation, document detection, question answering, and generally for extracting knowledge from the document. The system has particular utility in classifying documents by their general subject matter and retrieving documents relevant to a query.

Title
Natural language processing system for semantic vector representation which accounts for lexical ambiguity
Application Number
8/135815
Publication Number
5873056
Application Date
October 12, 1993
Publication Date
February 16, 1999
Inventor
Edmund Szu li Yu
Syracuse
NY, US
Woojin Paik
Syracuse
NY, US
Elizabeth D Liddy
Syracuse
NY, US
Agent
M Lukacher
K J Lukacher
Assignee
The Syracuse University
NY, US
IPC
G06F 17/22
G06F 17/20
G06F 17/30
View Original Source