Viswanath Poosala, Venkatesh Ganti: Method, apparatus and programmed medium for approximating the data cube and obtaining approximate answers to queries in relational databases. Lucent Technologies, Dickstein Shapiro Morin & Oshinsky, August 22, 2000: US06108647 (62 worldwide citation)

A novel and unique method of approximating the data cube and summarizing database data in order to provide quick and approximate answers to aggregate queries by precomputing a summary of the data cube using histograms and answering queries using the substantially smaller summary. A unique method acc ...

Surajit Chaudhuri, Kris Ganjam, Venkatesh Ganti, Rajeev Motwani: Efficient fuzzy match for evaluating data records. Microsoft Corporation, November 13, 2007: US07296011 (61 worldwide citation)

To help ensure high data quality, data warehouses validate and clean, if needed incoming data tuples from external sources. In many situations, input tuples or portions of input tuples must match acceptable tuples in a reference table. For example, product name and description fields in a sales reco ...

Rajeev Motwani, Surajit Chaudhuri, Venkatesh Ganti: Robust detector of fuzzy duplicates. Microsoft Corporation, Lee & Hayes PLLC, April 7, 2009: US07516149 (32 worldwide citation)

At least one implementation, described herein, detects fuzzy duplicates and eliminates such duplicates. Fuzzy duplicates are multiple, seemingly distinct tuples (i.e., records) in a database that represent the same real-world entity or phenomenon.

Rahul Kapoor, Venkatesh Ganti, Surajit Chaudhuri: Duplicate data elimination system. Microsoft Corporation, October 23, 2007: US07287019 (30 worldwide citation)

A process for finding a similar data records from a set of data records. A database table or tables provide a number of data records from which one or more canonical data records are identified. Tokens are identified within the data records and classified according to attribute field. A similarity s ...

Surajit Chaudhuri, Venkatesh Ganti, Rohit Ananthakrishna: Detecting duplicate records in database. Microsoft Corporation, Microsoft Corporation, November 1, 2005: US06961721 (27 worldwide citation)

The invention concerns a detection of duplicate tuples in a database. Previous domain independent detection of duplicated tuples relied on standard similarity functions (e.g., edit distance, cosine metric) between multi-attribute tuples. However, such prior art approaches result in large numbers of ...

Kaushik Chakrabarti, Venkatesh Ganti, Dong Xin: Efficient evaluation of object finder queries. Microsoft Corporation, Lee & Hayes PLLC, June 1, 2010: US07730060 (18 worldwide citation)

The subject disclosure pertains to a class of object finder queries that return the best target objects that match a set of given keywords. Mechanisms are provided that facilitate identification of target objects related to search objects that match a set of query keywords. Scoring mechanisms/functi ...

Bee Chung Chen, Venkatesh Ganti, Kaushik Shriraghav: Designing record matching queries utilizing examples. Microsoft Corporation, Lee & Hayes PLLC, December 15, 2009: US07634464 (11 worldwide citation)

The subject disclosure pertains to a powerful and flexible framework for record matching. The framework facilitates design of a record matching query or package composed of a set of well-defined primitive operators (e.g., relational, data cleaning . . . ), which can ultimately be executed to match r ...

Surajit Chaudhuri, Venkatesh Ganti, Kaushik Shriraghav: Systems and methods for estimating functional relationships in a database. Microsoft Corporation, Lee & Hayes PLLC, July 14, 2009: US07562067 (10 worldwide citation)

A system that facilitates estimating functional relationships associated with one or more columns in a database comprises a sampling component that receives a random sample of records within the database. An estimate generator component calculates an estimate of strength of functional relationships ...

Arnd Christian Konig, Venkatesh Ganti: Leveraging cross-document context to label entity. Microsoft Corporation, June 28, 2011: US07970808 (9 worldwide citation)

Entities, such as people, places and things, are labeled based on information collected across a possibly large number of documents. One or more documents are scanned to recognize the entities, and features are extracted from the context in which those entities occur in the documents. Observed entit ...

Venkatesh Ganti, Vassilakis Theodore, Yevgeny Agichtein: Segmentation of strings into structured records. Microsoft Corporation, December 1, 2009: US07627567 (9 worldwide citation)

An system for segmenting strings into component parts for use with a database management system. A reference table of string records are segmented into multiple substrings corresponding to database attributes. The substrings within an attribute are analyzed to provide a state model that assumes a be ...