06138113 is referenced by 94 patents and cites 8 patents.

A method is described for identifying pages that are near duplicates in a linked database. In the linked database, pages can have incoming links and outgoing links. Two pages are selected, a first page and a second page. For each selected page, the number of outgoing links is determined. The two pages are marked as near duplicates based on the number of common outgoing links for the two pages.

Title
Method for identifying near duplicate pages in a hyperlinked database
Application Number
9/131469
Publication Number
6138113
Application Date
August 10, 1998
Publication Date
October 24, 2000
Inventor
Monika R Henzinger
Menlo Park
CA, US
Jeffrey Dean
Menlo Park
CA, US
Agent
Skjerven Morrill MacPherson
Assignee
AltaVista Company
CA, US
IPC
G06F 17/30
View Original Source