- What?
- Why?
Even if the per-core raw performance is lower than a CPU, a GPU is well fitted for massive parallel computations.
- Who?
- How?
- A small example?
void compute(float A[N][N], float B[N][N], float C[N][N]){And the GPU-enabled version of this algorithm (in Nvidia CUDA):
for (int i=0;i<N;i++)
for (int j=0;j<N;j++)
C[i][j] = A[i][j] + B[i][j];
}
int main(){
compute(A,B,C);
}
__global__ void compute(float A[N][N], float B[N][N], float C[N][N]){Actually, the GPU compiler handles all the algorithm parallelization according to the kernel definition (__global__tag). During the execution, the kernel will be computed by N_Blocks of N_threads simultaneously.
int i = threadIdx.x;
int j = threadIdx.y;
C[i][j] = A[i][j] + B[i][j];
}
int main(){
//call the GPU-kernel
compute<<N_blocks,N_threads>>(A, B, C);
}
- Applications?
The main restriction is related to memory management as hundreds of threads have to share a restricted amount of memory (1gb).
However, more and more applications, in matrix computation, image and signal processing or MonteCarlo simulation have already demonstrated the potential of this solution.
links:
Nvidia CUDA: http://www.nvidia.fr/object/cuda_home_new.html
AMD Stream: http://www.amd.com/stream
OpenCL: http://www.khronos.org/opencl/
1 comment:
Technically, I suppose it should be mentioned that GPGPU computing is, in particular, only terribly useful for "SIMD" parallel computing - where you have the same process to apply to a lot of bits of data.
Obviously, this is better for some things than others, and the speed-ups from GPGPU vary dramatically for the various use cases - anything from 4x to 200x depending on the problem.
(Image processing is, of course, a trivial example of a really good application. Afterall, GPUs were originally designed the way they are to... process graphics.)
Post a Comment