1
Jan O Pedersen, Per Kristian Halvorsen, Douglass R Cutting, John W Tukey, Eric A Bier, Daniel G Bobrow: Iterative technique for phrase query formation and an information retrieval system employing same. Xerox Corporation, Oliff & Berridge, January 11, 1994: US05278980 (435 worldwide citation)

An information retrieval system and method are provided in which an operator inputs one or more query words which are used to determine a search key for searching through a corpus of documents, and which returns any matches between the search key and the corpus of documents as a phrase containing th ...


2
Jan O Pedersen, David Karger, Douglass R Cutting, John W Tukey: Scatter-gather: a cluster-based method and apparatus for browsing large document collections. Xerox Corporation, Oliff & Berridge, August 15, 1995: US05442778 (183 worldwide citation)

Scatter-Gather is a computer based document browsing method which operates in time proportional to a number of documents in a target corpus. The Scatter-Gather method includes: preparing an initial ordering of the corpus using, for example, an off-line computational method; determining a summary of ...


3
John W Tukey, Jan O Pedersen: Method and apparatus for information access employing overlapping clusters. Xerox Corporation, December 7, 1999: US05999927 (116 worldwide citation)

The present invention is a method and apparatus for document clustering-based browsing of a corpus of documents, and more particularly to the use of overlapping clusters to improve recall. The present invention is directed to improving the performance of information access methods and apparatus thro ...


4
Jan O Pedersen, John W Tukey: Method and apparatus for automatic document summarization. Xerox Corporation, Oliff & Berridge, June 10, 1997: US05638543 (113 worldwide citation)

Regions of a document such as sentences and blocks of sentences are scored and classified based upon their scores. An abstract of the document can be formed from the classified sentences. Sentences are classified by the use of words classified as stop words and vanish words. Sentences are scored bas ...


5
John W Tukey, Jan O Pedersen: Method and apparatus for information accesss employing overlapping clusters. Xerox Corporation, Duane C Basch, July 28, 1998: US05787422 (78 worldwide citation)

The present invention is a method and apparatus for document clustering-based browsing of a corpus of documents, and more particularly to the use of overlapping clusters to improve recall. The present invention is directed to improving the performance of information access methods and apparatus thro ...


6
John W Tukey, Jan O Pedersen: Method of ordering document clusters without requiring knowledge of user interests. Xerox Corporation, Tracy L Hurt, July 28, 1998: US05787420 (30 worldwide citation)

A computerized method of ordering document clusters for presentation after browsing a corpus of documents that presents document clusters in a logical fashion in the absence of any indication of the computer user's interests. The method begins by grouping the corpus into a plurality of clusters, eac ...


7
Francine R Chen, Dan S Bloomberg, John W Tukey: Automatic method of generating thematic summaries from a document image without performing character recognition. Xerox Corporation, December 8, 1998: US05848191 (23 worldwide citation)

A method of automatically generating a thematic summary from a document image without performing character recognition to generate an ASCII representation of the document text. The method begins with decomposition of the document image into text blocks, and text lines. Using the median x-height of t ...


8
John W Tukey, Jan O Pedersen: Method of ordering document clusters given some knowledge of user interests. Xerox Corporation, June 8, 1999: US05911140 (23 worldwide citation)

A method of automatically ordering the presentation of documents clusters generated from a ranked corpus of documents. First, the corpus is ordered into a plurality of clusters. Next, a rank is determined for each cluster based upon the rank of a document within that cluster. Afterward, the clusters ...


9
Dan S Bloomberg, John W Tukey, M Margaret Withgott: Detecting function words without converting a scanned document to character codes. Xerox Corporation, Oliff & Berridge, October 3, 1995: US05455871 (15 worldwide citation)

A method and apparatus detects function words in a first image of a scanned document without first converting the image to character codes. Function words include determiners, prepositions, articles, and other words that play a largely grammatical role, as opposed to words such as nouns and verbs th ...


10
Francine R Chen, John W Tukey: Automatic method of identifying drop words in a document image without performing character recognition. Xerox Corporation, Tracy L Hurt, December 15, 1998: US05850476 (13 worldwide citation)

A method of automatically identifying drop words in a document image without performing character recognition to generate an ASCII representation of the document text. First, the document image is analyzed to identify word equivalence classes, each of which represents at least one word of the multip ...