CUDA 5.0 and 5.5
TotalView now supports CUDA versions 5.0 and 5.5. With CUDA 5.0 TotalView does not support dynamic parallelism at all. With CUDA 5.5 we have limited support for dynamic parallelism. You should be able to use TotalView in the CUDA 5.5 runtime with applications that display dynamic parallelism, however we plan improvements to our functionality for displaying the relationships between dynamically launched kernels and navigating the various running kernels.
Xeon Phi Memory Debugging
TotalView's support for Intel Xeon Phi (MIC architecture) has been extended to include memory debugging with MemoryScape for Native and Symmetric modes. MemoryScape functionality is not yet available for Offload mode programs.
Xeon Phi Symmetric Mode
Xeon Phi support has been further extended to include Xeon Phi Symmetric Mode across a Xeon Phi cluster.
Mac OS X Mavericks
TotalView now supports the Macintosh OS X Mavericks platform. Please check the TotalView Reference Guide, Part III, “Platforms and Operating Systems,” for special installation instructions.
Scalable Data Aggregation
Some CLI commands, including
dstatus, now provide options to aggregate the data they display, making it much easier to understand the status and location of all the processes and threads being debugged.
Improved Breakpoint Performance
There is a significant improvement in performance when creating and using breakpoints to debug very large applications.
Early Access: Tree-based Scalable Debugging Infrastructure
TotalView's early access support for tree-based scalability has noticeably improved performance. This support is based on the MRNet overlay tree network, and is available to selected customers on request. This release also includes a new scalable root window that aggregates data about many processes rather than listing them out one per line.
FLEXLM and Security
The version of FLEXLM being used with TotalView has been upgraded to a more current version, which offers improved security.
Platforms and Compilers
Support for new versions of operating systems and compilers. For a complete listing of supported platforms, please see the document TotalView Platforms and System Requirements.
This release introduces the Sessions Manager, which allows you to set the configuration for a debugging session and preserve it from session to session. The Sessions Manager also provides a single, centralized interface for initiating debugging sessions. See the Getting Started Guide and User Guide for information on this new feature.
Xeon Phi Support
TotalView now fully support debugging for applications that take advantage of the Intel Xeon Phi coprocessors. TotalView supports debugging of host programs that use Intel Language Extensions for Offloading (LEO) as well as MPI and OpenMP programs that are compiled to run natively on the Xeon Phi. TotalView supports multiple Xeon Phi coprocessors per node. For more information, see the Xeon Phi Debugging documentation.
Support for Intel AVX and AVX2, and AMD XOP instruction sets
On Linux for the x86 and x86-64 architectures, TotalView's disassembler now supports most instructions from Intel's AVX and AVX2 set, and AMD's XOP set. As a result, the assembler code view will display these instructions, and single-stepping will work for code containing these instructions. This is not yet supported with the ReplayEngine reverse debugging functionality.
Enhanced Addresses Dialog
Setting action points on templatized or overloaded functions can result in a large number of individual action points, particularly in massively parallel programs. The new Addresses dialog for action points helps you to enable a subset of these action points that pertain to the problem you are trying to debug.
Cray ATP Support
Cray Abnormal Termination Processing (ATP) stops program execution at the time of a crash so you can debug the problem. TotalView now makes it easy to attach to such a held process.
STL Container Support
The TotalView STLView feature now includes support for the STL containers set, multiset, and multimap.
Support for Mac OS X Lion and Mountain Lion
TotalView now supports the Macintosh OS X Lion and Mountain Lion platforms. Please check the TotalView Installation Guide for special installation instructions.
Blue Gene/Q Support
Support is added for the latest generation of IBM Blue Gene supercomputers - the Blue Gene/Q. TotalView's Blue Gene/Q support features: hybrid MPI+OpenMP debugging, asynchronous thread control, memory debugging, dynamic library support, C++View, binary corefile debugging, fast action points, fast DLL debugging, and MPI message queue display.
Early Access Xeon Phi Debugging
TotalView provides early access debugging for applications that take advantage of the Intel Xeon Phi coprocessors. TotalView supports debugging of host programs that use Intel Language Extensions for Offloading (LEO) as well as MPI and OpenMP programs that are compiled to run natively on the Xeon Phi. TotalView supports multiple Xeon Phi coprocessors per node.
NVIDIA CUDA 4.2
TotalView's CUDA support has been updated to work with CUDA 4.2.
TotalView includes support for the OpenACC directives that are part of Cray CCE 8 compilers. These are frequently used on Cray XK supercomputers to easily take advantage of NVIDIA Fermi and Kepler accelerators.
Early Access Preview of the MRNet Infrastructure
Selected customers will have the opportunity to try an optimized version of TotalView that uses a the Multi-cast/Reduction Network (MRNet) as a communication layer between the TotalView main process and the TotalView debug agents.
Getting Started Guide and Documentation Improvements
A Getting Started Guide has been added to TotalView’s documentation, providing a quick jumpstart to using TotalView, MemoryScape and ReplayEngine. In addition, the table of contents has been reorganized to allow you to more easily find features and instructions.
Enhanced and Extended CUDA Support
Support added for the NVIDIA CUDA SDK 4.1 tool chain on Linux x86 64-bit clusters. We've also added support for and Cray XK supercomputers, which are x86-64 based but also feature Cray specific software and a proprietary Gemini interconnect.
ReplayEngine on Demand
The new ReplayEngine on Demand feature allows developers to use ReplayEngine more flexibly. ReplayEngine can now be enabled on a running application, even in the middle of a debugging session. Subsequent program execution is recorded and can be reviewed bi-directionally with ReplayEngine. Formerly, ReplayEngine had to be enabled when the application was initiated.
ReplayEngine on Cray XT
TotalView now provides ReplayEngine reverse debugging that works on both serial and parallel (MPI) programs running on the Cray XE. This was previously available for servers, workstations and clusters based on ethernet or infiniband interconnect technology.
Support for C++View in ReplayEngine
The C++View feature in TotalView allows developers to easily reformat their data objects. With this release developers can now use C++View transforms in conjunction with ReplayEngine recording and deterministic replay.
Enhanced TVScript Scalability
The scalability of TVScript has been improved so that batch debugging of large-scale MPI applications using TVScript can be done with 1024 process jobs.
Enhanced Dive Visibility
When the cursor hovers over a divable object, either a function or variable, in the TotalView source pane a red, dotted-line box appears around the object text, clearly indicating that this is a divable object.
This version of TotalView also delivers:
- Dynamic memory debugging on the BlueGene/P
- Early access support for the new OpenACC directives in the Cray Compiler Edition version 8
- Support for AIX 7.1
- Support for Fedora 15
- Support for the Cray CCE Compilers
Support for CUDA 4.0
TotalView 8.9.2 has full support for CUDA 4.0 and is also fully backwards compatible with CUDA 3.1 and 3.2.
Support for CUDA 3.2
Full CUDA 3.2 device register, context, call stack, exception and pinned memory support has been added. TotalView 8.9.1 is also fully backwards compatible with CUDA 3.0 and 3.1.
Display Array Statistics from the TotalView CLI
Command line interface (CLI) users can now display array statistics information; this enhancement mirrors the functionality provided through the GUI.
Array Viewer Improvements
The Array Viewer now inherits array type and handles Dive-in-All actions from the Data window.
Parallel Backtrace Viewer Enhancements
The Parallel Backtrace Viewer includes efficient handling of recursive calling routines and new CLI level parallel backtrace commands.
TotalView for CUDA
An add-on is available to TotalView that allows it to debug NVIDIA CUDA applications. With the NVIDIA CUDA add-on you can debug both the CPU and the GPU code in applications that use CUDA. You can set breakpoints, step, and dive in code running on the CUDA device using all the familiar TotalView GUI methods. TotalView supports multi-device debugging, handles CUDA function in-lining and provides type qualification in the expression system. You can display how your logical threads are being mapped to hardware and navigate threads using either hardware or logical coordinates.
C++View allows you to write short bits of code in C++ telling TotalView how to transform objects and data structures for representation in the debugger. This gives you an easy way to control the layout and display of information about your data structures when you are debugging your program.
Parallel Backtrace Window
TotalView can bring up a window that organizes information about your running program in the form of a text-based tree based on the stack backtraces of each thread within that program. This makes it really easy to look in a single window and see the routines your program is executing. You can expand the tree out to see individual program counters and thread status values, and collapse it to hide information.
2D Array View Window
TotalView can display two-dimensional arrays and two-dimensional array slices of higher dimensional arrays as numeric grids. TotalView provides significant control over how the data is displayed, since some patterns are better seen when numbers are displayed in scientific notation or floating point notation. You can easily relocate the slices and introduce strides (to see every 10th value, for example).
TVScript makes it possible for you to do non-interactive debugging within regular batch sessions. Simply tell TVScript the functions and line numbers you are interested in and the actions you would like the debugger to take when those points are reached and TVScript runs the program, pausing at the specified points and taking the specified actions. This release adds TVScript support for Cray XT, BlueGene/L and BlueGene/P.