A system and method for maximizing data compression by optimizing model selection during coding of an input stream of data symbols, wherein at least two models are run and compared, and the model with the best coding performance for a given-size segment or block of compressed data is selected such that only its block is used in an output data stream. The best performance is determined by 1) respectively producing comparable-size blocks of compressed data from the input stream with the use of the two, or more, models and 2) selecting the model which compresses the most input data. In the preferred embodiment, respective strings of data are produced with each model from the symbol data and are coded with an adaptive arithmetic coder into the compressed data. Each block of compressed data is started by coding the decision to use the model currently being run and all models start with the arithmetic coder parameters established at the end of the preceding block. Only the compressed code stream of the best model is used in the output and that code stream has in it the overhead for selection of that model. Since the decision as to which model to run is made in the compressed data domain, i.e., the best model is chosen on the basis of which model coded the most input symbols for a given-size compressed block, rather than after coding a given number of input symbols, the model selection decision overhead scales with the compressed data. Successively selected compressed blocks are combined as an output code stream to produce an optimum output of compressed data, from input symbols, for storage or transmission.