The present invention is a system and method for retrieving segments of stored video programs using closed caption text data. The closed caption text data is extracted from video programming signals received by the invention. Text records based on the extracted closed caption data are generated. Each text record is derived from the closed caption data for a single continuous video segment to which the text record serves as an index or key in retrieving this video segment. Preferably, each text record (a) has sufficient content to adequately describe the content of the video segment to which it serves as an index; and (b) corresponds to a video segment focused on a small number of topics. To accomplish (a) and (b) the present invention generates each text record so that it has a predetermined maximum length and so that it is derived from the closed caption data for a single uninterrupted speaker. During video data retrieval, video requests or queries input by users are evaluated either by comparing query terms with terms in the text records, or by comparing an interpretation of query terms with an interpretation of terms in the text records. Subsequently, video segment location information associated with each text record satisfying a query is used to retrieve the video segment to which the text record is an index.