A data processor (40) keeps track of misses to a cache (71) so that multiple misses within the same cache line can be merged or folded at reload time. A load/store unit (60) includes a completed store queue (61) for presenting store requests to the cache (71) in order. If a store request misses in the cache (71), the completed store queue (61) requests the cache line from a lower-level memory system (90) and thereafter inactivates the store request. When a reload cache line is received, the completed store queue (61) compares the reload address to all entries. If at least one address matches the reload address, one entry's data is merged with the cache line prior to storage in the cache (71). Other matching entries become active and are allowed to reaccess the cache (71). A miss queue (80) coupled between the load/store unit (60) and the lower-level memory system (90) implements reload folding to improve efficiency.