07302557 is referenced by 20 patents and cites 16 patents.

A processor method and apparatus that allows for the overlapped execution of multiple iterations of a loop while allowing the compiler to include only a single copy of the loop body in the code while automatically managing which iterations are active. Since the prologue and epilogue are implicitly created and maintained within the hardware in the invention, a significant reduction in code size can be achieved compared to software-only modulo scheduling. Furthermore, loops with iteration counts less than the number of concurrent iterations present in the kernel are also automatically handled. This hardware enhanced scheme achieves the same performance as the fully-specified standard method. Furthermore, the hardware reduces the power requirement as the entire fetch unit can be deactivated for a portion of the loop's execution. The basic design of the invention involves including a plurality of buffers for storing loop instructions, each of which is associated with an instruction decoder and its respective functional unit, in the dispatch stage of a processor. Control logic is used to receive loop setup parameters and to control the selective issue of instructions from the buffers to the functional units.

Title
Method and apparatus for modulo scheduled loop execution in a processor architecture
Application Number
9/728441
Publication Number
7302557 (B1)
Application Date
December 1, 2000
Publication Date
November 27, 2007
Inventor
Matthew C Merten
Champaign
IL, US
Wen mei W Hwu
Champaign
IL, US
Agent
Pillsbury Winthrop et al
Assignee
Impact Technologies
IL, US
IPC
G06F 9/45
G06F 9/30
View Original Source