TotalView for HPC Features
Debugging Support for Key Technologies
Powerful Functionality to Make Debugging as Easy as Possible
Tensorflow with Python and C++ (click to enlarge)
C and C++ give you control over the details of data, access patterns, memory management, and execution. But direct control over low level machine behavior leaves little margin for error when it comes to building and maintaining scalable scientific applications. TotalView provides the ideal environment for troubleshooting complex C and C++ applications. They feature detailed views of objects, data structures, and pointers, simplifying working with complex objects.
The standard template library (STL) collection classes simplify the way you manipulate your program's data, but they complicate troubleshooting when your program hangs or crashes. TotalView type transformation facility (TTF) provides a flexible way for you to provide alternate displays for data objects. STLView transformations provide a logical view of STL collection class objects, providing a more practical view of list data. The end result is a simplified, intuitive view into the structure and behavior of your code.
TotalView Reverse Connect
Use Reverse Connect to easily establish an interactive debugging session between a job executing on compute nodes and the Totalview UI running on a front-end node. The basic process is to embed the tvconnect command in a batch script; when the batch job runs, the tvconnect process connects with the TotalView UI to start the debugger server process on the batch node. For more information, see Chapter 18, “Reverse Connections,” in the TotalView User Guide.
Mixed Language Debugging With Python and C/C++
Many developers are leveraging the power of Python to develop applications and calling into C/C++ code to perform compute intensive tasks or access existing algorithms. Debugging across the language barriers can be challenging but TotalView makes this easy by showing you a fully integrated call stack across the language barriers and allows all the Python and C/C++ variables and their values to be inspected. No other debugger makes it this easy for you to understand, diagnose and fix your mixed language Python and C/C++ applications! Learn more!
While there are some things that both C and Fortran have in common, Fortran is not C. TotalView correctly represents Fortran notation, types, and concepts, such as common blocks and modules, that are not present in other languages.
Fortran is especially good at representing and manipulating numerical and mathematical data. One of its key characteristics is its facility for representing array data. Scientists and engineers working with Fortran source code are doing so in part to take advantage of language-level support for things like multidimensional arrays, array assignment, and the powerful features of Fortran pointers. Our technology can help you leverage these key attributes of Fortran to ensure working code.
Most of the applications you are developing are engines for manipulating data. Whether observational or computational, it is the data that you really care about. When you are trying to develop insight into the behavior of a physical system you approach it quantitatively. The same approach is necessary when trying to understand the behavior of computational systems.
Troubleshooting involves exploring the behavior of a live application, looking for clues as to why the computation is not proceeding as expected, slicing the data presented in different ways to uncover patterns. It is critical that you have the tools that make it easy to view and manipulate that data, and TotalView helps streamline this process.
The fact that memory is a limited resource has a significant impact on the implementation of your application, especially when it contains millions of lines of code. As program complexity grows, memory leak debugging and troubleshooting malloc errors become more difficult. Memory-related code defects can cause out-of-control resource and random data corruption. Memory errors can also manifest themselves as random program crashes, negatively impacting productivity. In a worst-case scenario, memory errors can result in corrupted data causing programs to generate inaccurate results. TotalView helps you manage that risk by ensuring working code and accurate results.
TotalView provides comprehensive support for MPI, OpenMP, UPC, and GA. With support for more than 20 implementations of MPI, TotalView has been the debugger of choice in parallel programming courses.
The era of increasing clock rates has ended. Processor architectures are characterized by multicore and many-core designs. Building a multithreaded application or transitioning from a serial application to a parallel application presents significant challenges. TotalView and ReplayEngine are natively built to help you manage the challenges presented by concurrency, parallelism, and threads.
Race conditions are a common problem, even in a well tested multithreaded application. You can use locks, semaphores, and atomic operations to avoid race conditions, but they can introduce subtle problems of their own. Our tools provide visibility into the behavior of your code, increasing your understanding of the impact of these problems.
TVScript is a framework for non-interactive debugging with TotalView. You define a series of events that may occur within the target program, TVScript loads the program under its control, sets breakpoints as necessary, and runs the program. At each program stop, TVScript gathers data which is logged to a set of output files for your review when the job has completed. If you call TVScript with no arguments, it provides usage guidelines and a listing of available events and actions. TVScript has been likened to printf on steroids.