In this talk, Arnaud introduces a mishmash of tips and tricks to get the best performance out of your kernels on GPUs.
He discusses some of the differences between GPU and CPU hardware and some of the things you can do at the API and code level to improve your performance as well as cover some seemingly innocuous things that can have a large impact on performance.
Here is a much more in-depth guide by NVIDIA for optimization of CUDA code: http://docs.nvidia.com/cuda/cuda-c-programming-guide/#performance-guidelines
There is also a guide by AMD for optimization of OpenCL code for their devices: http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/opencl-optimization-guide/
Always remember to time your changes when optimizing to make sure you are actually improving performance.