A very fast, memory efficient, highly expandable, highly efficient CCNUMA processing system based on a hardware architecture that minimizes system bus contention, maximizes processing forward progress by maintaining strong ordering and avoiding retries, and implements a full-map directory structure cache coherency protocol. A Cache Coherent Non-Uniform Memory Access (CCNUMA) architecture is implemented in a system comprising a plurality of integrated modules each consisting of a motherboard and two daughterboards. The daughterboards, which plug into the motherboard, each contain two Job Processors (JPs), cache memory, and input/output (I/O) capabilities. Located directly on the motherboard are additional integrated I/O capabilities in the form of two Small Computer System Interfaces (SCSI) and one Local Area Network (LAN) interface. The motherboard includes main memory, a memory controller (MC) and directory DRAMs for cache coherency. The motherboard also includes GTL backpanel interface logic, system clock generation and distribution logic, and local resources including a micro-controller for system initialization. A crossbar switch connects the various logic blocks together. A fully loaded motherboard contains 2 JP daughterboards, two PCI expansion boards, and up to 512 MB of main memory. Each daughterboard contains two 50 MHz Motorola 88110 JP complexes, having an associated 88410 cache controller and 1 MB Level 2 Cache. A single 16 MB third level write-through cache is also provided and is controlled by a third level cache controller.