- What?

- Why?

Even if the per-core raw performance is lower than a CPU, a GPU is well fitted for massive parallel computations.

- Who?

- How?

- A small example?

void compute(float A[N][N], float B[N][N], float C[N][N]){And the GPU-enabled version of this algorithm (in Nvidia CUDA):

for (int i=0;i<N;i++)

for (int j=0;j<N;j++)

C[i][j] = A[i][j] + B[i][j];

}

int main(){

compute(A,B,C);

}

__global__ void compute(float A[N][N], float B[N][N], float C[N][N]){Actually, the GPU compiler handles all the algorithm parallelization according to the kernel definition (__global__tag). During the execution, the kernel will be computed by N_Blocks of N_threads simultaneously.

int i = threadIdx.x;

int j = threadIdx.y;

C[i][j] = A[i][j] + B[i][j];

}

int main(){

//call the GPU-kernel

compute<<N_blocks,N_threads>>(A, B, C);

}

- Applications?

The main restriction is related to memory management as hundreds of threads have to share a restricted amount of memory (1gb).

However, more and more applications, in matrix computation, image and signal processing or MonteCarlo simulation have already demonstrated the potential of this solution.

links:

Nvidia CUDA: http://www.nvidia.fr/object/cuda_home_new.html

AMD Stream: http://www.amd.com/stream

OpenCL: http://www.khronos.org/opencl/

## 1 comment:

Technically, I suppose it should be mentioned that GPGPU computing is, in particular, only terribly useful for "SIMD" parallel computing - where you have the same process to apply to a lot of bits of data.

Obviously, this is better for some things than others, and the speed-ups from GPGPU vary dramatically for the various use cases - anything from 4x to 200x depending on the problem.

(Image processing is, of course, a trivial example of a really good application. Afterall, GPUs were originally designed the way they are to... process graphics.)

Post a Comment