- For the longest time you were forced to spend on most powerful CPUs to get better performance. Now the GPU (Graphics Processor Unit) found on your graphics card, can now offload that work.
- Four years ago Nvidia developed a programming environment called CUDA (Compute Unified Device Architecture) through which some program processes can be run on the graphics chip. Only Nvidia chips starting from GeForce 8000 series onwards supported this.
- AMD also supports the general standard OpenCL, pioneered by a company called the khronos group (which Nvidia also supports) using which you can share your programs workloads on OpenCLs compatible processors (CPU and GPU).
- Even Microsoft approved of this development, equipping the new DirectX 11 instructions set with a new interface (Direct Compute) using which you can run program processes on the GPU.
- In an operation such as counting the no. of times a particular word in a book, CPU starts from page 1, go through the text word by word and ends at the last page but the GPU divides the book into many small parts, distributes them to all its stream cores, and then simply count the appearance of the word in a fraction of time.
- Based on the GPU core design, a software must be divided into 240 parts ( 240 threads) to be able to use 240 stream cores as in GeForce GTX 295 but it is not as easy since many software programs cannot be parallelized or it is extremely difficult to do so, now even the current CPUs experience the same problem for dividing into 8 threads to use 8 virtual cores as in Core i7.
- The actual real-world processes that best use this capability are found in video and scientific editing work where there are no book pages but instead repeated addition and multiplications of floating point numbers in big matrices that are carried out for thousands of numbers for the exact same operation.
- In future, In order to run a software program at a lightning speed, each and every program line must be divided into several threads while creating and every processing step does not depends on the result of the previous process and thus it makes the program a complete parellelized one.
So the processing speed not only depends on the hardware but also the compatible software.
Reference : CHIP Magazine April 2010