A method for storing and searching documents also useful in disambiguating word senses and a method for generating a dictionary of context vectors. The dictionary of context vectors provides a context vector for each word stem in the dictionary. A context vector is a fixed length list of component values corresponding to a list of word-based features, the component values being an approximate measure of the conceptual relationship between the word stem and the word-based feature. Documents are stored by combining the context vectors of the words remaining in the document after uninteresting words are removed. The summary vector obtained by adding all of the context vectors of the remaining words is normalized. The normalized summary vector is stored for each document. The data base of normalized summary vectors is searched using a query vector and identifying the document whose vector is closest to that query vector. The normalized summary vectors of each document can be stored using cluster trees according to a centroid consistent algorithm to accelerate the searching process. Said searching process also gives an efficient way of finding nearest neighbor vectors in high-dimensional spaces.