05418951 is referenced by 421 patents and cites 8 patents.

A method of identifying, retrieving, or sorting documents by language or topic involving the steps of creating an n-gram array for each document in a database, parsing an unidentified document or query into n-grams, assigning a weight to each n-gram, removing the commonality from the n-grams, comparing each unidentified document or query to each database document, scoring the unidentified document or query against each database document for similarity, and based on the similarity score, identifying retrieving, or sorting the document or query with-respect to language or topic.

Title
Method of retrieving documents that concern the same topic
Application Number
932522
Publication Number
5418951
Application Date
September 30, 1994
Publication Date
May 23, 1995
Inventor
Marc Damashek
Hampstead
MD, US
Agent
Robert D Morelli
Thomas O Maser
Assignee
The United States of America represented by the Director of National Security Agency
DC, US
IPC
G06F 7/00
View Original Source