Method and apparatus for identifying particular desired information bearing records having desired predetermined identifiable characteristics from a set of such records in a base data file. A special retrieval file including arrays of binary coded elements is produced and maintained from the information content of the base data file. Each array of the retrieval file corresponds to a particular predetermined identifiable characteristic of language structure potentially present in or associated with the set of records concerned and each element in such an array corresponds to and is representative of the address or location of a particular one of the records in the base data file. The elements are binary coded to represent the presence or absence of the predetermined identifiable characteristics of language structure associated with that particular array in the corresponding record. Furthermore, the set of predetermined identifiable characteristics is itself chosen, in one exemplary embodiment, to represent the alphabetic value and relative sequential location of information characters in associated groups of characters such as words contained in the records. In this manner, the retrieval file itself represents an irreversible information compression of the language structure and/or information contained in the set of information bearing records.
To locate any particular desired record, the retrieval file is first searched by identifying and selecting those arrays representing desired predetermined identifiable characteristics of language structure and comparing the binary values of respectively corresponding elements in the selected arrays thus identifying which records in the base data file have all the desired predetermined identifiable characteristics of language structure. Once the desired records in the base data file have been identified in this manner, they are then selected and displayed, copied, etc., as desired to provide the requisite access or retrieval of information that had previously been stored in the base data file. Particular choices and variations in the selection of the set of predetermined identifiable characteristics of language structure to be represented by the arrays in the retrieval file will change the search and retrieval characteristics, capabilities, flexibility, etc., of the system as may be desired for particular types of record sets and particular types of base data file formats, etc.