Debugging CUDA-Accelerated Parallel Applications with TotalView
CUDA introduces developers to a number of new concepts (such
as kernels, streams, warps and explicitly multi-level memory) that are not
encountered in serial or other parallel programming paradigms. Visibility into
these elements is critical for troubleshooting and tuning applications that
make use of CUDA. This paper will highlight CUDA concepts implemented in CUDA
3.0 - 4.0, the impact of those concepts for troubleshooting CUDA, and how
TotalView helps users deal with these new CUDA-specific constructs. CUDA is
frequently used alongside MPI parallelism and host-side multi-core and
multi-thread parallelism. The TotalView parallel debugger provides developers with
an integrated view of all three levels of parallelism within a single debugging
session.
Register Below to Download the White Paper: