In a hidden Markov model-based speech recognition system, multilayer perceptrons (MLPs) are used in context-dependent estimation of a plurality of state-dependent observation probability distributions of phonetic classes. Estimation is obtained by the Bayesian factorization of the observation likelihood in terms of posterior probabilities of phone classes assuming the context and the input speech vector. The context-dependent estimation is employed as the state-dependent observation probabilities needed as parameter input to a hidden Markov model speech processor to identify the word sequence representing the unknown speech input of input speech vectors. Within the speech processor, models are provided which employ the observation probabilities in the recognition process. The number of context-dependent nets is reduced to a single net by sharing the units of the input layer and the hidden layer and the weights connecting them in the multilayer perceptron while providing one output layer for each relevant context. Each output layer is trained as an independent network on the specific examples of the corresponding context it represents. Training may be optimized at an intermediate set of weights between the context-independent-associated weights and the context-dependent associated weights to which training would normally converge.