06782357 is referenced by 18 patents.
Cluster- and pruning-based language model compression is disclosed. In one embodiment, a language model is first clustered, such as by using predictive clustering. The language model after clustering has a larger size than it did before clustering. The language model is then pruned, such as by using entropy-based techniques, such as Rosenfeld pruning, or by using Stolcke pruning or count-cutoff techniques. In one particular embodiment, a word language model is first predictively clustered by a technique described as P(Z|xy)×P(z|xyZ), where a lower-case letter refers to a word, and an upper-cluster letter refers to a cluster in which the word resides.