1
John R Nickolls, Stephen D Lew, Brett W Coon, Peter C Mills: Synchronization of threads in a cooperative thread array. NVIDIA Corporation, Townsend and Townsend and Crew, August 31, 2010: US07788468 (53 worldwide citation)

A “cooperative thread array,” or “CTA,” is a group of multiple threads that concurrently execute the same program on an input data set to produce an output data set. Each thread in a CTA has a unique thread identifier assigned at thread launch time that controls various aspects of the thread's proce ...


2
John R Nickolls, Brett W Coon, Ming Y Siu, Stuart F Oberman, Samuel Liu: Single interconnect providing read and write access to a memory shared by concurrent threads. NVIDIA Corporation, Townsend and Townsend and Crew, March 16, 2010: US07680988 (43 worldwide citation)

A shared memory is usable by concurrent threads in a multithreaded processor, with any addressable storage location in the shared memory being readable and writeable by any of the threads. Processing engines that execute the threads are coupled to the shared memory via an interconnect that transfers ...


3
Brett W Coon, John Erik Lindholm: System and method for managing divergent threads in a SIMD architecture. NVIDIA Corporation, Patterson & Sheridan L, April 1, 2008: US07353369 (38 worldwide citation)

One embodiment of a computing system configured to manage divergent threads in a thread group includes a stack configured to store at least one token and a multithreaded processing unit. The multithreaded processing unit is configured to perform the steps of fetching a program instruction, determini ...


4
Brett W Coon, John Erik Lindholm, Svetoslav D Tzvetkov: Structured programming control flow using a disable mask in a SIMD architecture. NVIDIA Corporation, Patterson & Sheridan, November 10, 2009: US07617384 (29 worldwide citation)

One embodiment of a computing system configured to manage divergent threads in a SIMD thread group includes a stack configured to store state information for processing control instructions. A parallel processing unit is configured to perform the steps of determining if one or more threads diverge d ...


5
Brett W Coon, John R Nickolls, John Erik Lindholm, Svetoslav D Tzvetkov: Structured programming control flow in a SIMD architecture. NVIDIA Corporation, Patterson & Sheridan, January 25, 2011: US07877585 (27 worldwide citation)

One embodiment of a computing system configured to manage divergent threads in a SIMD thread group includes a stack configured to store state information for processing control instructions. A parallel processing unit is configured to perform the steps of determining if one or more threads diverge d ...


6
John R Nickolls, Roger L Allen, Brian K Cabral, Brett W Coon, Robert C Keller: Apparatus and method for monitoring and debugging a graphics processing unit. NVIDIA Corporation, Cooley Godward Kronish, October 6, 2009: US07600155 (24 worldwide citation)

A system has a graphics processing unit with a processor to monitor selected criteria and circuitry to initiate the storage of execution state information when the selected criteria reaches a specified state. A memory stores execution state information. A central processing unit executes a debugging ...


7
Bryon S Nordquist, Brett W Coon: Managing state information for a multi-threaded processor. NVIDIA Corporation, Kilpatrick Townsend & Stockton, December 6, 2011: US08074224 (22 worldwide citation)

Embodiments of the present invention facilitate dynamically adapting to state information changes in a graphics processing environment. In one embodiment, a master register holds state information corresponding to units of work (threads) to be performed. The state information in the master register ...


8
Brett W Coon, John Erik Lindholm, Peter C Mills, John R Nickolls: Processing an indirect branch instruction in a SIMD architecture. NVIDIA Corporation, Patterson & Sheridan, July 20, 2010: US07761697 (17 worldwide citation)

One embodiment of a computing system configured to manage divergent threads in a thread group includes a stack configured to store at least one token and a multithreaded processing unit. The multithreaded processing unit is configured to perform the steps of fetching a program instruction, determini ...


9
Peter C Mills, John Erik Lindholm, Brett W Coon, Gary M Tarolli, John Matthew Burgess: Scheduling instructions from multi-thread instruction buffer based on phase boundary qualifying rule for phases of math and data access operations with better caching. NVIDIA Corporation, Patterson & Sheridan, April 29, 2008: US07366878 (16 worldwide citation)

A processor buffers asynchronous threads. Current instructions requiring operations provided by a plurality of execution units are divided into phases, each phase having at least one math operation and at least one texture cache access operation. Instructions within each phase are qualified and prio ...


10
Norbert Juffa, Brett W Coon: Maximized memory throughput using cooperative thread arrays. NVIDIA Corporation, Kilpatrick Townsend & Stockton, April 12, 2011: US07925860 (15 worldwide citation)

In parallel processing devices, for streaming computations, processing of each data element of the stream may not be computationally intensive and thus processing may take relatively small amounts of time to compute as compared to memory accesses times required to read the stream and write the resul ...