The present invention is based on the use of linguistic, especially phonological, knowledge to guide the speech recognition process. A speech signal containing an utterance is received and linguistic cues in the speech signal are detected. From these detected linguistic cues, a symbolic representation of the contents of the speech signal is generated. This symbolic representation comprises at least one word division, wherein each word division consists of an onset-rhyme pair and associated phonological elements. These phonological elements are univalent, may appear in all languages and are distinguishable from each other and directly interpretable in the speech signal. A lexicon of predetermined symbolic representations is provided for words in a particular language. A best match to the generated symbolic representation in found in the lexicon, thereby recognizing the spoken word.