GeForce 600 series

GeForce 600 Series
Cards
Release date	March 22, 2012
Codename	GK10x
Architecture	Kepler
Models	GeForce Series GeForce GT Series GeForce GTX Series
Fabrication process and transistors	292M 40 nm (GF119) 585M 40 nm (GF108) 1.170M 40 nm (GF116) 1.950M 40 nm (GF114) 1.270M 28 nm (GK107) 1.270M 28 nm (GK208) 2.540M 28 nm (GK106) 3.540M 28 nm (GK104)
Entry-level	GT 610 GT 620 GT 630 GT 640
Mid-range	GTX 650 GTX 650 Ti GTX 650 Ti Boost GTX 660
High-end	GTX 660 Ti GTX 670
Enthusiast	GTX 680 GTX 690
API support
Direct3D	Direct3D 12.0 (feature level 11_0)
OpenCL	OpenCL 1.2
OpenGL	OpenGL 4.5
Vulkan	Vulkan 1.0 SPIR-V
History
Predecessor	GeForce 500 series
Successor	GeForce 700 series

The GeForce 600 Series is a family of graphics processing units developed by Nvidia, used in desktop and laptop PCs. It serves as the introduction for the Kepler architecture (GK-codenamed chips), named after the German mathematician, astronomer, and astrologer Johannes Kepler. GeForce 600 series cards were first released in 2012.

Where the goal of the previous architecture, Fermi, was to increase raw performance (particularly for compute and tessellation), Nvidia's goal with the Kepler architecture was to increase performance per watt, while still striving for overall performance increases. The primary way Nvidia achieved this goal was through the use of a unified clock. By abandoning the shader clock found in their previous GPU designs, efficiency is increased, even though it requires more cores to achieve similar levels of performance. This is not only because the cores are more power efficient (two Kepler cores using about 90% of the power of one Fermi core, according to Nvidia's numbers), but also because the reduction in clock speed delivers a 50% reduction in power consumption in that area.

Kepler also introduced a new form of texture handling known as bindless textures. Previously, textures needed to be bound by the CPU to a particular slot in a fixed-size table before the GPU could reference them. This led to two limitations: one was that because the table was fixed in size, there could only be as many textures in use at one time as could fit in this table (128). The second was that the CPU was doing unnecessary work: it had to load each texture, and also bind each texture loaded in memory to a slot in the binding table. With bindless textures, both limitations are removed. The GPU can access any texture loaded into memory, increasing the number of available textures and removing the performance penalty of binding.

Finally, with Kepler, Nvidia was able to increase the memory clock to 6 GHz. To accomplish this, Nvidia needed to design an entirely new memory controller and bus. While still shy of the theoretical 7 GHz limitation of GDDR5, this is well above the 4 GHz speed of the memory controller for Fermi.

The GeForce 600 Series contains products from both the older Fermi and newer Kepler generations of Nvidia GPUs. Kepler based members of the 600 series add the following standard features to the GeForce family:

The Kepler architecture employs a new Streaming Multiprocessor Architecture called SMX. The SMX are the key method for Kepler's power efficiency as the whole GPU uses a single "Core Clock" rather than the double-pump "Shader Clock". The SMX usage of a single unified clock increases the GPU power efficiency due to the fact that two Kepler CUDA Cores consume 90% power of one Fermi CUDA Core. Consequently, the SMX needs additional processing units to execute a whole warp per cycle. Kepler also needed to increase raw GPU performance as to remain competitive. As a result, it doubled the CUDA Cores from 16 to 32 per CUDA array, 3 CUDA Cores Array to 6 CUDA Cores Array, 1 load/store and 1 SFU group to 2 load/store and 2 SFU group. The GPU processing resources are also double. From 2 warp schedulers to 4 warp schedulers, 4 dispatch unit became 8 and the register file doubled to 64K entries as to increase performance. With the doubling of GPU processing units and resources increasing the usage of die spaces, The capability of the PolyMorph Engine aren't double but enhanced, making it capable of spurring out a polygon in 2 cycles instead of 4. With Kepler, Nvidia not only worked on power efficiency but also on area efficiency. Therefore, Nvidia opted to use eight dedicated FP64 CUDA cores in a SMX as to save die space, while still offering FP64 capabilities since all Kepler CUDA cores are not FP64 capable. With the improvement Nvidia made on Kepler, the results include an increase in GPU graphic performance while downplaying FP64 performance.

...
Wikipedia