TotalView Tips

How does MemoryScape (included in TotalView) provide you with the ability to see exactly which code objects, functions and allocations are using up the most memory?

Optimizing your memory footprint can be important, especially as you scale up your application. Getting a back of the envelope estimate may be easy enough, but when your program fails an hour into execution with an Out of Memory error, you need to be able to find out what library, function, or behavior bloated up your application. Check for leaks with MemoryScape, but don't stop there -- use the memory statistics function to compare memory usage across your parallel job and then switch to the hierarchical heap status report. There you will find your heap usage broken down by library and object file. Drill into the ones that catch your interest and you'll find that the next level of information is based on the source code file that was compiled into that object. A few more clicks get you function level info and then line level info. Pick up on subtle trends by comparing two or more heap memory data files with the memory comparison feature.



Using MemoryScape and TotalView together


TotalView 8.7 and MemoryScape 3.0 can be used together providing you have the appropriate licenses for the products and the number of processes you are debugging.

1. Use MemoryScape within a TotalView debugging session.

The TotalView Debug menu provides an Open MemoryScape option which will open up MemoryScape on the program that you are debugging. Both interfaces will be open and either one can be used to start or stop the target program.

An advantage of this usage mode is that MemoryScape will coordinate with TotalView to provide pointer annotation on all pointers that point into the heap. This feature will tell you if the pointer you are examining is pointing at a valid memory allocation, is dangling (pointing at an allocation that has been freed), or doesn't point at the heap at all.

Memory debugging should be enabled on the target program when it is started to ensure that MemoryScape will be able to build a complete and valid heap map, necessary for performing analysis.

Most TotalView Team licenses include the right to use MemoryScape in this way.

2. Use TotalView within a MemoryScape debugging session.

If you start out debugging a program with MemoryScape and decide that you need to look more closely at your process you can bring up TotalView by clicking on the bug icon on the toolbar. This will start up TotalView focused on the program that you were debugging with MemoryScape.

This usage mode provides the same advantage described in mode 1: MemoryScape will coordinate with TotalView to provide pointer annotation on all pointers that point into the heap.

You must have both MemoryScape and TotalView licenses for this mode.

I'd like to point out that if you really want to explore the behavior of your program you can actually run a debugging session which simultaneously uses TotalView 8.7, MemoryScape 3.0, and ReplayEngine 1.5 to give you a powerful and complete view into your program, its heap memory allocation behavior and its execution history.


How do I navigate all the source code files that make up my program?

Working with multiple source code files in TotalView is quite straightforward. You can dive on any function name in the source code window to see the sources for that function (if they are available). However, sometimes you want to jump right to a function or file of interest, or you want to compare two bits of code side by side.

View > Lookup function or file — Keyboard accelerator 'f'

Type the name of any source file or function used in your program. You can do a partial or slightly misspelled or ambiguous name and the dialog will auto-complete or disambiguate the response.

Click OK. The source code pane in the center of the process window will refocus to that bit of code.

TotalView makes it easy to return to where you were before the find operation. Click the small Back arrow in the upper right corner of the source code window.

View > Reset (Ctrl-R)
If at any time you want to see the current location in the program, use the View > Reset (Ctrl-R) option to get back to where the instruction pointer is pointing.

Window > Duplicate
You can open a second window using the Window > Duplicate option.

You can close duplicate windows individually using File > Close (Ctrl-W) or, if you have several open and want just one, File > Close Relatives.


Does TotalView have a command-line (non-windowing) interface?

You can invoke TotalView's command line interface, which is called the CLI, from within the GUI using the Tools > Command Line command. Or, you can invoke it from a shell using the totalviewcli command.

The TotalView CLI is integrated within version 8.0 of Tcl, which means that TotalView's CLI commands are built into a Tcl interpreter. Because TotalView's Tcl is the standard 8.0 version, the TotalView CLI supports all libraries and operations that run using version 8.0 of Tcl.

Integrating CLI commands into Tcl makes them intrinsic Tcl commands. This means that you can enter and execute CLI commands in exactly the same way as you enter and execute built-in Tcl functions such as file or array. It also means that you can embed Tcl primitives and functions within CLI commands that take arguments. Or, you can embed CLI commands within sequences of Tcl commands.

For example, you can create a Tcl list that contains a list of threads, use Tcl commands to manipulate that list, and then have a CLI command operate on the elements of this list. Or, you create a Tcl function that dynamically builds the arguments that a process will use when it begins executing.

The CLI is described in the "TotalView Users Guide" and the "TotalView Reference Guide."


What is the CLI and why bother?

The CLI (Command Line Interface) is a significant and under-used part of TotalView. The CLI, in a nutshell, gives you a second way to debug your programs. This way is also extensible.

So why did we create the CLI? Here are a few reasons:

  • Some UNIX/Linux people are mouse-phobic, preferring the keyboard to the mouse. (There are some engineers here at TotalView Technologies who only use the GUI as a last resort.)
  • When you don't have much bandwidth, a GUI can really slow you down.
  • We needed a good way to automatically test TotalView. In other words, we use the CLI to do regression testing.
  • The CLI differs from the command line interfaces used by other debuggers in being programmable. We took a standard Tcl shell and added commands to it that access the same routines that the GUI does. This means that TotalView's CLI debugging commands are actually Tcl commands that control debugging activities.
What's in it for you? No debugger can be all things to all people. Your environments are unique and your programs are unique. To create something that meets everyone's needs would be impossible. And this is where the CLI comes in. You can use the CLI to create debugging routines that help you solve your problems.
What's the downside? You've got to learn Tcl (it's easy) and find out what the CLI's commands are. That takes time and when you've got a problem, you're in a hurry.

The CLI is described in the "TotalView Users Guide" and the "TotalView Reference Guide."


How do I configure the CLI xterm?

The xterm window used by the CLI is rather bland. By creating a CLI macro, you can make changes to this environment. Here is a macro that adds scrollbars:

proc xterm_scrollbar {{enable 1}} {
if {$enable} {
# Turn on scrollbar
puts -nonewline "\x1b\[?30h"
} else {
# Turn off scrollbar
puts -nonewline "\x1b\[?30l"
}
}

This macro will need to be invoked before TotalView creates a CLI window. This is exactly what the TV::open_cli_window_callback variable is used for.

catch {dset TV::open_cli_window_callback {xterm_scrollbar 1}}

If you are already using this callback, you'll need use dlappend instead of dset.

How do I debug executables linked with shared libraries?


When you start TotalView with a dynamically linked executable, TotalView:

  • Runs a sample process and then discards it.
  • Reads information from the process.
  • Reads the symbol table for each library.
When you attach to a process using shared libraries:

  • TotalView loads the shared library's dynamic symbols if you attached to the process the dynamic loader ran.
  • TotalView allows the process to run the dynamic loader to completion if you attached to the process before TotalView runs the dynamic loader. It then loads the dynamic symbols for the shared library.
If you are executing a dynamically linked program, information isn't loaded until it is needed. This happens because the linker places code into your program that will do the loading. On most operating systems, you can get the symbols loaded earlier by setting the LD_BIND_NOW environment variable. For example:

setenv LD_BIND_NOW 1

How do I see what TotalView executes when I use mpirun?


When you're using mpirun to start a program, you need to initialize the TOTALVIEW environment variable with the location of TotalView's binary file. If you want to see what is being, set the value of TOTALVIEW to "echo". Here's an example:

20 % setenv TOTALVIEW "echo my_totalview"
21 % mpirun -np 4 -tv AttachSubsetAIX
my_totalview /nfs/fs/home/tests/AttachSubsetAIX -a -p4pg

/nfs/fs/home/tests/PI1417328 -p4wd /nfs/fs/home/tests -mpichtv
22 %

Here, "my_totalview" is just a string. If it were the TotalView binary file, you'd be seeing exactly what gets executed.
You can, of course, set the TOTALVIEW variable to anything that will execute.

Using the Attach Subset Command


When you are running a large MPI job, you may not want to TotalView to attach to each process as it starts. Instead, you may want to choose the processes to which TotalView should attach. This is exactly what the Group > Attach Subset command is for. Here's the dialog box displayed you select this command:

Clicking on processes in the top part of dialog box changes the action from Attach to Detach or Detach to Attach. If the process is not yet started, TotalView will not attach to any process whose action is Detach. However, if TotalView is already attached to the process, it will detach itself from processes whose action is Detach.

The controls in the Filters area let you limit which processes TotalView automatically attaches to, as follows:

  • The Communicator control lets you name that the processes must be involved with the communicators that you select. For example, if something goes wrong that involves one of your communicators, selecting it from the list tells TotalView to only attach to the processes that use it.
  • The Talking to Rank control further limits the processes to those that you name here. Most of the entries in this list are just the process numbers. In most cases, you would select All or MPI_ANY_SOURCE.
  • The three checkboxes in the Message Type area add yet another qualifier. Checking a box tells TotalView to only display communicators that are involved with a Send, Receive, or Unexpected message.

Many applications place values that indicate the rank in a variable so that the program can refer to them as they are needed. If you do this, you can display the variable in a Variable Window and then select the Tools > Attach Subset (Array of Ranks) command to display this dialog box.

Using wide characters


Beginning with version 6.6.0, TotalView can display wide characters using normal C and C++ conventions. If you create an array of wchar_t wide characters, TotalView automatically changes the type to $wstring[n]; that is, it is displayed as a null-terminated, quoted string with a maximum length of n. For an array of wide characters, the null terminator is L'0'. Similarly, TotalView changes wchar_t* declarations to $wstring* (a pointer to a null-terminated string).

This figure shows the declaration of two wide characters in the Process Window. The Expression List Window shows how TotalView displays their data. The L in the data indicates that TotalView is displaying a wide literal.

If the wide character uses from 9 to 16 bits, TotalView displays the character using the following universal-character code representation:

\uXXXX

X represents a hexadecimal digit. If the character uses from 17 to 32 bits, TotalView uses the following representation:

\UXXXXXXXX

How can I use ssh instead of rsh to launch tvdsvr processes?


To use ssh instead of rsh to launch tvdsvr processes, you will need to change the server launch string. Begin by opening the File > Preferences Dialog Box and selecting the Launch Strings Page.

Look at the Enable single debug server launch area. At version 7.0.1, the launch string on a Linux machine is:

%C %R -n "%B/tvdsvr%K -working_directory %D ...
This string can differ for each architecture that you run on and you can set it differently for each. This launch string is a pattern that TotalView uses when it builds the command that launches the tvdsvr process. You'll find information on what you can type in this area in our Reference Guide.

The %C replacement character indicates which command TotalView uses to launch the tvdsvr process. Here is how you tell TotalView what to do:

If you set the TVDSVRLAUNCHCMD environment variable, TotalView replaces %C with this variable's value. You could even have this set for all users at your location using a global resource file.

You could directly change the pattern within TotalView. For example, your new pattern could be:

/pathtossh/ssh %R -n "%B/tvdsvr%K -working_directory %D ...

TotalView will remember what you've changed this to for all of your sessions.

If you want a change you've made to your local preferences to be used by everyone at your site, you can change TotalView's global initialization file. Here's how:

  • Go to your saved preferences file, which is located in your ~/.totalview/preferences6.tvd file, and locate an entry that looks something like the following:

    dset -version 2 \
    -version {[string compare $TV::platform linux-x86] == 0}\
    TV::server_launch_string {ssh %R -n \
    "%B/tvdsvr -working_directory %D -callback %L
    -set_pw %P -verbosity %V %F"}

    If you add the -set_as_default option to the dset command, this launch string also becomes the default launch string (notice the 2nd line):

    dset -version 2 \
    -set_as_default
    -version {[string compare $TV::platform linux-x86] == 0}\
    TV::server_launch_string {ssh %R -n \
    "%B/tvdsvr -working_directory %D -callback %L
    -set_pw %P -verbosity %V %F"}

    Why bother? When you use the -set_as_default option, TotalView overwrites its default with your definition. So, if a user makes local changes--local changes override system defaults--and later presses the Default button on a launch string's page, what you enter here is displayed instead of TotalView's default.

  • Copy this setting to your site's preferences file. This will be located in a directory that looks something like:
    installation_dir/toolworks/totalview.7.0.1-0/arch/lib/.tvdrc
    Place the copied string at the end of this file.

    you do this, users who do not set their own server launch string will use this one.

Creating default preferences


When you press a default button within a File > Preferences dialog box, TotalView reinitializes some settings to their original values. However, what happens if you set a value in your TotalView tvdrc file when you press a default button? In this case, setting a variable doesn't change what TotalView thinks the default is, so TotalView still changes the setting back to its defaults.

The next time you invoke TotalView, TotalView will again use the value in your tvdrc.

You can tell TotalView that the value set in your tvdrc file is the default if you use the -set_as_default option to the dset command. Now when you press a default button, TotalView will use your value instead of its own.

If your TotalView administrator sets up a global .tvdrc file, TotalView reads values from that file and merges them with your preferences and other settings. If the value in the .tvdrc file changes, TotalView ignores the change because it has already set a value in your local preferences file. If the administrator uses the -set_as_default option, you can be told to press the default button to get the changes. If, however, the administrator doesn't use this option, the only way to get changes is by deleting your preferences file.

Using gnu_debuglink files


Some versions of Linux allow you to place debugging information in a separate file. These files, which can have any name, are called gnu_debuglink files. Because this information is stripped from the program's file, it almost always greatly reduces the size of your program. In most cases, you would create gnu_debuglink files for system libraries or other programs for which it is inappropriate to ship versions have debugging information.

you create an unstripped executable or shared library, you can prepare the gnu_debuglink file. Here's an overview:

  • Create a .debug copy of the file. This second file will only contain debugging symbol table information. That is, it differs from the original in that it does not contain code or data. Create this file on Linux systems that support the --add-gnu-debuglink and --only-keep-debug command-line options. If objcopy -- -help mentions --add-gnu-debuglink, you should be able to create this file. See man objcopy for more details.
  • Create a stripped copy of the image file, and add a .gnu_debuglink section to the stripped file that contains the name of the .debug file and the checksum of the .debug file.
  • Distribute the stripped image and .debug files separately. The idea is that the stripped image file will normally take up less space on the disk, and if you want the debug information, you can also install the corresponding .debug file.
The following example creates the gnu_debuglink file for a program named hello. It also strips the debugging information from hello:

objcopy --only-keep-debug hello hello.gnu_debuglink.debug

objcopy --strip-all hello hello.gnu_debuglink

objcopy --add-gnu-debuglink=hello.gnu_debuglink.debug \

hello.gnu_debuglink

For more information, see the TotalView Reference Guide.

Memory


When you tell your operating system to run a program, it loads the program and defines the address space that the program can access. For example, if your program is executing within a 32-bit computer, the address space is approximately 4 gigabytes.

The operating system does not actually allocate the memory in this address space. Instead, this space is memory mapped, which means the operating system maps the relationship between the theoretical address space and what it actually uses. In this way, your program only uses what it needs. Typically, operating systems divide memory into pages, and then create a map correlating the executing program with the pages that contain the program's information. The following illustration shows regions of a program. The arrows point to the memory pages containing the program's data:

In this figure, the stack contains three stack frames and each is mapped to its own page in this example. Similarly, the heap shows two allocations, each of which is mapped to its own page. (This isn't what really happens as a page can have many stack frames and many heap allocations. But doing this makes a nice picture.)

The programs you write have to be compiled, linked, and loaded. The following figure shows a program whose source code resides in four files. Running these files through a compiler creates object files. A linker then merges these object files and any external libraries needed into a load file. This load file is the executable program that is stored on your computer's file system.

When the linker creates the load file, it combines the information contained within each of the object files into one unit. Combining them is relatively straightforward. The load file at the bottom of this figure simplifies this file. A load file contains more sections and more information.

The contents of the sections in this load file are as follows:

  • Data section, which contains static variables and variables initialized outside of a function. Here is a small example program:

    int my_var1 = 10;
    void main ()
    {
    static int my_var2 = 1;
    int my_var3;
    my_var3 = my_var1 + my_var2;
    printf("here's what I've got: %i\n", my_var3);
    }

    The data section contains the my_var1 and my_var2 variables. The memory for the my_var3 variable is dynamically and automatically allocated within the stack by your program's run time system.
  • Symbol table section, which contains addresses (usually offsets) to where routines and variables reside.
  • Machine code section, which contains an intermediate binary representation of your program. (It is intermediate because addresses are not yet resolved.)
  • Header section, which contains information about the size and location of information in all other sections of the object file.
When the linker creates the load file from the object and library files, it interweaves these sections into one file. The linking operation creates something that your operating system can load into memory. The following figure shows this process.

Memory: Leaks and dangling pointers


If you have trouble remembering the difference between a leak and a dangling pointer, this may help. Before either problems occurs, memory is created on the heap and the address of this memory block is assigned to a pointer. A leak occurs when the pointer gets deleted, leaving a block with no reference. In contrast, a dangling pointer occurs when the memory block is deallocated, leaving a pointer that points to deallocated memory. Both are shown in the following figure.

Why using realloc() can cause problems


The realloc() function can create unanticipated problems. This function can either extend a current memory block, or create a new block and free the old. Although you can check to see which action occurred, you need to code defensively so that problems do not occur. Specifically, you must change every pointer pointing to the memory block to point to the new one. Also, if a pointer doesn't point to the beginning of the block, you need to take corrective actions.
In the following figure, two pointers are pointing to a block. the realloc() function executes, ptr1 points to the new block. However, ptr2 still points to the original block, a block that was deallocated and returned to the heap manager. It is now a dangling pointer.

How do I find memory leaks?


The TotalView Memory Debugger can locate your program's memory leaks and display information about them.

  1. Before execution begins, enable the Memory Debugger. (See the Debugging Memory Problems Using TotalView Guide.)
  2. Run the program and then halt it where you want to look at memory problems. Allow your program to run for a while before stopping execution to give it enough time to create leaks.
  3. From the Memory Debugger Window (invoked using the Tools > Memory Debugging command), select the Leak Detection tab.
  4. Select one or more processes in the Process Set area.
  5. Select a view within the Generate View area and click the Generate View button. For example, you might select Source View.
  6. Examine the list. you select a leak in the top part of the window, the bottom of the window shows a backtrace of the place where the memory was allocated. you select a stack frame in the backtrace, TotalView displays the statement where the block was created.

(A backtrace is a list of stack frames. The Memory Debugger displays a list that contains the stack frames that existed when you asked the heap manager to allocate memory.)

The backtrace that the Memory Debugger displays is the backtrace that existed when your program made the heap allocation request. It is not the current backtrace.

The line number displayed in the Memory Debugger Source Pane is the same line number that TotalView displays in the Process Window Source Pane. If you go to that location, you can begin devising a strategy for fixing the problem. Sometimes you get lucky and the fix is obvious. In most cases, it isn't clear what was (or should be) the last statement to access a memory block. Even if you figure it out, it's extremely difficult to determine if the place you located is really the last place your program needs this data. At this point, it just takes patience to follow your program's logic.

Can TotalView show me if a pointer is dangling?


If you enable memory debugging, TotalView automatically displays information in the Variable Window about the variable's memory use. The following small program allocates a memory block, sets a pointer to the middle of the block, and then deallocates the block:

1  main(int argc, char **argv)
2 {
3 int *addr = 0; /* Pointer to start of block. */
4 int *misaddr = 0; /* Pointer to interior of block. */
5
6 addr = (int *) malloc (10 * sizeof(int));
7
8 misaddr = addr + 5; /* Point to block interior */
9
10 /* Deallocate the block. addr and */
11 /* misaddr are now dangling. */
12 free (addr);
13 }

The following figure shows two Variable Windows. Execution was stopped before the free() function on line 12 executes. Both windows contain a memory indicator saying that blocks are allocated.

the free() function on line 12 executes, the messages change:

Can TotalView tell me when a memory block is deallocated?


It can. (If it couldn't, this would be a very short tip.)

The first thing you need to do is to locate the memory block. Here are two ways:

  • From within many of the Memory Debugger's views, right-click on a block or block information, and select Properties from the context menu. This command directly brings up a Block Properties window.
  • From within a Variable Window, select the Tools > Block Properties command. Then, go to the Process Window and select the Tools > Memory Block Properties command.
The displayed window looks something like this (it may look different if you you've selected other blocks and depending upon the way you brought up the window):

You'll need to expand the top area. The easiest way to do that is by pressing the Hide Backtrace Information button.

You may have to press a "+" button to see information.

At the bottom are two checkboxes. If you select Notify when deallocated, the Memory Debugger will stop this process's execution and display a Memory Event Notification window when your program deallocates this memory block.

Memory debugging and attaching to processes


The Memory Debugger needs to interpose its agent between your program and its calls to the malloc library. In most cases (except when running on an RS/6000), this is pretty easy for programs that use just one process-you just enable memory debugging from within TotalView.

Things aren't so simple when you are debugging programs that create additional processes. This includes MPI programs. In these cases, TotalView only attaches to created processes they begin executing. This is too late for the Memory Debugger. Once the process begin executing, TotalView can't interpose its agent.

You can get around this problem by explicitly linking your program with the agent. Chapter 4 of the Debugging Memory Problems Using TotalView Guide shows how to do this.

Here's an example of linking a Linux program (pathnames were edited to keep them short):

gcc -g -o leakypi cpi.c -I/usr/local/mpich.1.2.5.2/include \
-L/usr/local/mpich.1.2.5.2/lib \
-L/opt/tv/linux-x86/lib/ \
-ltvheap -lmpich -Wl,-rpath /opt/tv/linux-x86/lib

When freeing storage, your program crashes


There are a variety of errors that can occur when freeing and reallocating memory. TotalView's Memory Tracker, which is new in version 6.3, tracks every time your program uses any of the calls within the malloc API. (Even some Fortran programs use this API.) When you free or reallocate memory, it checks to see if the region of memory being freed is a region that was allocated. If it isn't, it stops execution and displays a window that tells you the kind of error that occurred and backtrace.

Clicking on a stack frame in the backtrace takes you to the statement that caused the problem.

For more information, see the TotalView Users Guide.

When freeing storage, your program crashes


There are a variety of errors that can occur when freeing and reallocating memory. TotalView's Memory Tracker, which is new in version 6.3, tracks every time your program uses any of the calls within the malloc API. (Even some Fortran programs use this API.) When you free or reallocate memory, it checks to see if the region of memory being freed is a region that was allocated. If it isn't, it stops execution and displays a window that tells you the kind of error that occurred and backtrace.

Clicking on a stack frame in the backtrace takes you to the statement that caused the problem.

For more information, see the TotalView Users Guide.

Painting allocated and deallocated memory


In large programs, it is nearly impossible to determine if your program is accessing memory that has been allocated and initialized or if it is erroneously accessing memory that was deallocated.

The Memory Debugger can help you identify these kind of problems by writing a value into memory locations that your program allocates or deallocates. Writing this pattern is called block painting.

To enable painting:
  1. Select the Tools > Memory command.
  2. Check the Enable memory debugging option.
  3. Open the Block Painting area by clicking on the + symbol.
  4. Change the pulldowns for Paint allocations and Paint deallocations from Pending to On. (Pending means that the Memory Debugger will use default values and doesn't yet know what these values are.)

your program begins executing, you can open a Variable Window that contains a variable residing in allocated memory. You'll now be able to tell what was initialized.

These snapshots were taken just the program allocated memory for the red_balls variable and before it sets values to structure members. The upper-left snapshot shows the memory's allocation/deallocation status. The middle window shows the structure's elements. The memory debugger set the values for the value, spare and colour members to 0xa110ca7f. Because colour points to memory that hasn't yet been allocated, TotalView also displays a bad address message.

The values for x and y show the double precision equivalent to 0xa110ca7f pattern. This is shown in the bottom-right window, where the Type for variable x was changed to int[2].

Why realloc() has problems, revisited

Here is a snapshot of a small program that allocates some memory, sets a couple of values within the memory, and then reallocates the region:

In the following table, the column on the left contains an Expression List Window that shows the value of the p and q pointers as the program executes. (The program was compiled using gcc on a computer running the Red Hat Linux operating system.) The column on the right describes what you are seeing. The line numbers in this column are those shown in the program snapshot.


Expression List Window
Location and Explanation




Line 8: Immediately before the pointer pis initialized to the memory returned bymalloc().




Line 9: Immediately a memory block is allocated and assigned to p. Notice that the memory location is 0x804a008.




Line 11: Immediately the first set of bytes is initialized to 2.




Line 12: Immediately the pointer p is incremented to point to the next integer location. The memory location is now 0x804a00c.




Line 13: Immediately the second set of bytes is initialized to 4.




Line 14: Immediately pointer q is set to be equal to pointerp.




Line 15: Immediately pointer p is decremented so that it again points to the beginning of the block.




Line 17: Immediately the block pointed to by p is reallocated. The value of p is now 0xb7fs8008. Notice that the memory manager has copied the memory values contained in the old block.




Line 18: the pointer p is incremented. It now points to an integer 4 value . q, however, is pointing to a different memory location that also contains a value of 4.




Line 19: the second integer value in the reallocated memory block is set to 10. At this time, pand q are different.




Line 20: adding the offset calculated in line 15 to p and setting that value toq. Both pointers again point to the same memory location.

Visualizing the heap


The Memory Debugger can provide a visual depiction of the heap.

You create this view in the same way as any other:
  1. Select Tools > Memory Debugging.
  2. Select Enable Memory Debugging with the Configuration Tab.
  3. Select the Heap Status Tab.
  4. If you want leaks to be displayed, press the ... button next to the Graphical View pulldown.
  5. In the Preferences Window, select Label leaked memory blocks. If you are debugging a C++ program, you probably also want to select Check interior pointers during leak detection.
  6. pressing OK, let your program run until a breakpoint.
  7. Press the Generate View button.

The top area displays bars indicating allocated memory blocks. The block's coloring indicates the block's status. You can change the scale of what is displayed using the magnifying glass icons just above and to the right of the blocks.

When you select a block in the top area, the Memory Debugger looks up information on the block and displays information in the bottom area. While the block you selected is highlighted, the Memory Debugger also highlights other blocks created by code that had the same backtrace.

The bottom area can display information in two ways:
  • Heap Information: the center pane displays information on the block that you selected, and the right pane displays information about all blocks that were created that have the same backtrace.
  • Backtrace/Source: Contains two panes that display the backtrace that existed when the block was allocated and the line that caused the allocation to occur.
If you right-click your mouse on a block in the top portion, the Memory Debugger displays a menu containing a Properties command. If you select this command, the Memory Debugger both displays its Block Properties window, which contains more information about the block, and begins tracking this block for you.

Filtering memory information


If your program allocates a large number of blocks, the information contained within the Memory Debugger's views can be overwhelming. In addition, memory allocations can come from different libraries and different parts of your program. Beginning with TotalView 6.7, you can control how much view information the Memory Debugger displays by applying filters.

To create or edit a filter, select the ... button to the right of the Enable Filtering checkbox.

The Memory Debugger responds by displaying its Data Filters dialog box. To create a new filter, press the Add button. To change an existing filter, select the filter, then press the Edit button. The following figure shows the Data Filters and the Edit Filter Dialog Box (which is the same as the Add Filter Dialog Box):

The way in which you create a filter is similar to the way in which you create a filter within the Mozilla or Thunderbird mail systems.

  1. Within the Add Filter or Edit Filter Dialog Boxes, select the Add button to add a new condition line.
  2. Use the pulldown lists in the first and second columns of the newly added line to select what action you want to occur.
  3. Finally, enter what you want filtered in the third column.
    Notice that you can change the order in which the Memory Debugger will apply a condition. While changing the order shouldn't change the result, you would want to move a condition that removes the most information higher in the list.
 
Here are the values you can select in the first two columns:

Property

The object that the Memory Debugger will look for:

Process/Library Name
Source File Name
Class Name
Function Name
Line Number
Size (bytes)
Count
PC


Operator

The operator indicates the relationship the value has to the property. Select one of the items from the pulldown list. If the property you've selected is a string, the Memory Debugger displays the following list:

contains
not contains
starts with
ends with
equals
not equals


If the item is numeric, it displays the following list:

<=
<
=
!=
>
>=


Setting event notifications


The TotalView Memory Debugger tracks all of your program's allocations, deallocations, and reallocations. Some of these events are expected. Others, such as a double free, are programming errors.

You can tell the Memory Debugger to stop execution and tell you when these events occur by selecting the Advanced Button within the Configuration Page.

By default, the Memory Debugger stops execution when any of these events occur. If you remove the checkmark from an event, you won't be notified. It is important to understand that you aren't changing the way in which the Memory Debugger tracks memory events. Instead, you're just telling the Memory Debugger that it shouldn't tell you when the event occurs.

Memory Event Actions Dialog Box and TotalView Team


If you have a TotalView Team license, the Memory Debugger can perform additional actions when an event occurs. Here's a snapshot of the TotalView Team Memory Events Actions dialog box:

Selecting either the Generate a core file and abort the program or Generate a lightweight memory data file option tells the Memory Debugger to write a file when an event occurs. If you tell the Memory Debugger to:
  • Generate a core file, the Memory Debugger writes the file to disk and aborts execution. (The operating system routines that generate the core file also abort the program.) As you are still within the TotalView, you can restart your program or load the core file into the current TotalView session.
  • Write a lightweight memory file, the Memory Debugger writes a file that is similar to the file that it writes when you use the File > Export command. These lightweight files can then be read back into the Memory Debugger in the same way that it can read in exported .dbg files. In contrast to a what happens when the Memory Debugger creates a core file, your program can continue executing the Memory Debugger writes this file.
If you select the Stop the process and show the event details option, the Memory Debugger displays its Event Details window. This window also lets you generate these files.

Viewing block properties


As your program executes, you may want to track memory blocks contained within the heap to see what is happening to them. Do this by selecting the Variable Window's Tool > Block Properties command, or by right-clicking on a block within the Heap Status Graphical View and selecting the Properties command.

You can now display these properties by selecting the Tools > Block Properties command from within the Process Window. For example:

If you expand an individual property and then increase the size of the top pane, you'll see something similar to this snapshot. Notice the Notify when deallocated and Notify when reallocated buttons at the bottom of the top pane. When a button is selected, TotalView stops program execution when the block is either allocated or deallocated.

For example, suppose you have a double free problem. Normally, the Memory Debugger stops execution when the second free occurs. If you select this button, TotalView also stops execution at the point where your first deallocated this memory.

Saving view information


Unlike the text version, which is a static display of the data, the HTML version is interactive, allowing you to display and hide information. You can write this information be selecting the button to the right of the Memory Debugger's Generate View button.

Using guard blocks


you allocate memory, it is your responsibility not to write data outside the allocated block. The Memory Debugger can help you detect these kinds of problems by surrounding memory allocations with a small amount of additional memory. These additional memory blocks are called guard blocks. If your program overwrites these blocks, the Memory Debugger can tell that a problem occurred.

You can tell that a problem occurred in two ways:
  • When you are displaying a Heap Status view, you can ask for a Corrupted Guard Blocks View. The Heap Graphical view also shows the guard regions and corrupted blocks.
  • When your program deallocates memory, the Memory Debugger can check the deallocated block's guards. If they've been violated-that is, if you're program has written data into them-the Memory Debugger can stop execution and alert you to the problem.
For example, suppose you allocate 16 bytes and you write 64 bytes of data. While the first 16 bytes are correctly written, the remaining 48 aren't. You will have overwritten the guard blocks for both blocks and some of the next block's data. That is, you will have inadvertently changed some data, data that when accessed will be incorrect.

Using guard blocks to detect problems is usually an interactive process. Asking for notification when the block is deallocated lets you know that a problem has occurred. Because you now know where the block was corrupted, you can use this as a starting point to locate the cause of the problem. In many cases, you will rerun your program, focusing on those blocks. For example, you could set a watchpoint on the end of block the next time the block is allocated. (Do this by chasing the pointer that points to the beginning of the block, then casting the block into an array of $voids, then set it on the last element in the array.)

You can step through your program and periodically ask the Memory Debugger to check the guards. This will help you locate where your program is corrupting data.

Setting and using baselines


Debugging memory problems is an iterative process. First you see if you have problems and you try to figure out what the source of the problem is. You also need to evaluate how important the problem is. For example, it is usually a waste of time to look for a small memory leak unless you're convinced that whatever caused the leak will continue leaking memory until you've consumed it all. One of the better ways to see what is going on is to set a baseline using the Process Window's Process > Set Heap Baseline Command.

setting this command, run your program, the use the Process > Heap Change Summary to generate a summary of your program's memory changes.

Buttons on this window let you obtain information on allocations made since you created the baseline and or leaks that have occurred.

While there are many different ways you can use this information, one of the most common is to set a baseline right before a function executes, then use the Process Window's Next command to step over it, then check to see what has happened.

If you need to more closely examine what has occurred, you can open the Memory Debugger and display a Leak Detection or Heap Status View. While these views normally show all information that the Memory Debugger has collected, each of these views has a Relative to Baseline checkbox that limits the information being displayed to the allocations and leaks that occurred you set the baseline.

Obtaining notification when a block is deallocated


The Memory Debugger can let you know when your program deallocates a memory block.

  1. your program allocates the Memory Block, use the Variable Window's Tools > Add to Block Properties command or right click on the block within the Memory Debugger.
  2. Display the Block Properties Window. TotalView displays the window, you'll see a list of blocks that you've told the Memory Debugger to watch.

    Expand the top Memory Blocks area. Either press the + symbol or the Hide Backtrace Information button at the bottom of this window. If you press this button, you'll see the something like the following

    Selecting the Notify when deallocated check box tells the Memory Debugger to monitor this memory block such that when your program frees it, it should stop execution and let you know that this just occurred. selecting the check box, close the window.
  3. Select the Go button from the toolbar. When the block is deallocated, the Memory Debugger stops execution and displays its Memory Event Details Window.

Comparing memory states


  • The Memory Debugger lets you compare the state of memory in the following ways:
  • You can compare the existing memory state against a previously saved memory state.
  • You can compare two previously saved memory states.
  • You can compare two existing memory states against one another.
  • You can examine the imported memory state in the same way as the Memory for a live process. For example, you can look for leaks, graphically display the heap, and so on.
The Memory Debugger's File > Export command lets you write memory data to disk. At a later time, you can use the File > Import command to import it back in and compare this state with another state.

Now that you've imported data, you can create view's into the program's memory exactly as if it were the data for an executing program. In addition, you can use create a Memory Compare View. you select either two imported processes or an imported and live process, you can tell the Memory Debugger to display information that tells you the differences between the two sessions.


Displaying memory usage information


You can graphically display memory usage information. Do this by selecting generating a graphical view. Here's are some examples:

Here are some features you should be aware of:
  • Change the chart type by selecting an item in the pulldown list. Your choices are: Line, Bar, Stacked Bar, and Pie.
  • You can change and adjust the size of the chart by selecting items from the magnifying glass pulldown list.
  • If you click on a process within most charts, the Memory Debugger selects the process in the lower half of the view. In most cases, it will draw a line or a crosshair so that you can easily compare different process values.
  • Sort the process information by clicking on the text above the tabular information in the bottom part of the view.
  • Suppress the display of memory type by clicking on the colored buttons above the chart. For example, the Text and Data areas (the green and red areas or lines) are not very interesting so you might want to click the green and red buttons so that the Memory Debugger does not display this information.

How do I tell TotalView how to process a signal?


While TotalView always tries to do the right thing about your program's signals, you may want to customize what happens. If you choose the File > Signals command, TotalView displays a dialog box containing a list of signals.

Each signal has four radio buttons associated with it. Selecting a button tells TotalView what it should do when something raises that signal. Your choices are: Error, Stop, Resend, and Ignore.

For more information, see Handling Signals within the "TotalView Users Guide."

How do I stop TotalView from popping up `Stop Process' questions?


When your program loads new processes, TotalView pops up question boxes asking if it should stop execution so that you can set breakpoints within the process. You could, of course, do other things.

TotalView asks these questions when your program uses a dynamically loaded library or when it starts other processes. Here's an example of one kind of question box:

Depending upon the number of times this happens, this can be extremely annoying.

You can stop these questions from appearing by selecting the File > Preferences command and making changes in one or both of the following pages:
  • Clear the Ask to stop when loading dynamic libraries on the Dynamic Libraries page.
  • Select the Run the group within the When the job goes parallel or calls exec() area on the Parallel page.

Why am I seeing assembler instead of source code?


If TotalView is displaying assembler instead of source code and you've compiled your program using -g, you could be suffering from a mis-set search path.

I often see this problem when I move my program from where I compiled it.

You can fix this using the File > Search Path Command.

Use the EXECUTABLE_PATH tab to tell TotalView to add these directories to your search path. The directories named here are searched before the directories listed in your PATH variable. If this isn't what you want or if you find this inconvenient, you have three alternatives:
  • Alternative 1: Make sure your search path is set right before starting TotalView.
  • Alternative 2: Use the CLI to reset your search path. For example:

    dset EXECUTABLE_PATH ../test_dir;$EXECUTABLE PATH

    If you place this statement in your .tvdrc file, you won't have think about setting your path whenever you start TotalView.
  • Create a setting for your files using the other tabs in this dialog box. Using these tabs is described in the online help.

The Search Path and TotalView Built-in Functions


TotalView has two built-in functions that can help you set search paths. Both take a string argument and return a mapped string (MS). If the MS does not name a directory, no directories are searched.

$tree(string)

If the MS names a directory, TotalView search all the directories contained in the MS, including the MS itself.

$link(string)

If the MS names a directory and the file is not in the MS or the file is in the MS but it is not a symbolic link, the function returns no values.

Since you can have multiple levels of symbolic links, TotalView keeps following links until it finds the actual executable file. It will then look in this directory for your file. If the file isn't there, TotalView backs up the chain of links until either it finds the file or determines that it cannot find it.

The following figure shows the Sources tab from within the File Searching dialog box. In it, you will see five items. TotalView will look in each of these places searching for your source files. The $link function also tells TotalView to look for symbolic links in the directories named in the EXECUTABLE_DIRECTORY variable. (Future tips will discuss these directories.However, you can find the information you need in the online help.)


Creating a Search Path Directory Tree


The previous tip of the week mentioned the $tree function that you can add to the Source tab of the Search Path dialog box.

Here's how $tree was defined (slightly rewritten):

$tree(string)

TotalView search all the subdirectories contained in string, including the string directory itself for a file.

This tip provides more details on using this function.

TotalView uses the directories you name in the $tree function as a starting point to walk through the subdirectories contained with the arguments to $tree. That is, you are adding this directory and all of the directories underneath it to the TotalView search path list. Assume that your source files are underneath:

/home/foo/source
In addition, foo/source contains other directories:

database gui objects...

Each of these directories contains its own series of subdirectories that organize your program into different related areas. If this hierarchy continues for multiple levels, it is tedious to add each individual directory to the search path. Using $tree, all you have to do is type:

$tree(/home/foo/source)

This function tells TotalView to search all the subdirectories underneath source as well as the files contained within source.

If you only need to include one or two directories, you can enter each as an argument to a $tree function. Do this by separating the arguments with colons (:).

Here's an example:

$tree(/home/foo/source/database:/home/foo/source/objects)

TotalView searches the path rules in the Source tab from top to bottom, and you cannot add a $tree function as the first or last line in this tab's list. However, you can enter more than one $tree function and you can use the colon separators to name more than one top-level directory.

How do I set the way TotalView handles signals?


It depends upon what you want to do. If you just want to override the way signals are handled one time, use the File > Signals command. If, however, you want to override them every time you use TotalView, you can use the TV::signal_handling_mode CLI variable within your .tvdrc file. (This variable was added at Release 6.0) You do not need to set all signals, just the ones you are changing.

For example, here is the command you'd enter into your .tvdrc file if you want TotalView to resend a SIGTERM signal:

dset TV::signal_handling_mode {Resend=SIGTERM}

Chapter 4 of the "TotalView Reference Guide" describes setting and using this variable. (You'll need to scroll the page to locate this information.) Chapter 3 of the "TotalView Users Guide" tells you about .tvdrc files.

Installing Tcl/CLI macros


Now that you have mastered the slowly procedures, you need to get them into TotalView. The easiest, but not best, way is to use the Tcl source command. For example, here is what you would type from within a CLI window:

source slowly.tvd

Your file doesn't need to have a file name extension and you can use any extension you want. However, by convention, TotalView CLI macros have a .tvd extension. As there is no default extension, you must type the complete file name.

The better way is to place these procedures in a startup file and have them automatically become part of your TotalView environment.

If you look in your home directory, you'll find a .totalview subdirectory. If you have a file named .tvdrc within it, you already have a startup file. If you don't have one, copy the slowly.tvd file to this directory and rename it to .tvdrc. If you have a .tvdrc file, copy slowly.tvd to this directory, then open your .tvdrc file and add the source statement. You can, of course, just copy the contents of slowly.tvd into your .tvdrc file but you might as well keep things separate.

The next time you start TotalView, the slowly command will be available.

How do I set (or reset) command arguments?


TotalView, like almost all programs, has a number of command line options. The program you are debugging also has arguments. TotalView's -a option lets you separate options needed by TotalView from those you're sending to your program. That is, everything the -a belongs to your program and will be sent to it.

you get your program running under TotalView's control, you may want to modify these arguments or add arguments that you forgot. The first time you open it, the Arguments Tab within the Process > Startup Parameters Dialog Box displays the arguments that you entered on the command line. For example, suppose you started TotalView in the following way:

totalview arraysLinux -a 3 3

Here is what appears in this dialog box:

While you can edit these arguments at anytime, TotalView only sends them to your program when it starts or restarts executing.

Startup (Part 1): tvdrc


TotalView needs to work in a great many environments. What has evolved is a rather complex system where startup data can be placed in many different places. Here is an overview:

The tvdrc file is where you'll find and place startup commands. Notice that there are three of them. If they exist, TotalView will read each one! All three can contain command definitions, commands, and variable settings. The technical term for what's found in these files is stuff.
  • The global .tvdrc file is most often used by TotalView administrators. An administrator would place stuff here that affects all users.
  • The tvdrc file (notice no period in the name) resides within your home directory's .totalview subdirectory. This is where you place stuff that you always want set.
  • The local .tvdrc file is in the same directory as your executable. It contains stuff important when you are debugging just that executable.
The files are always read in this order.

Startup (Part 2): The command line


Command line options have a different purpose: they affect just the session being started.

Generally, command line options tell TotalView to do something that is best done before it starts executing. For example, the -pvm option enables PVM support within TotalView, which is something you can't do once TotalView starts executing. Or the -nlb option tells TotalView that it shouldn't automatically load breakpoints-loading them is the default. Command line options are:
  • TotalView Command Syntax
  • TotalView Debugger Server Command Syntax
Two options that are more general are -e and -s. The -e command lets you directly execute a CLI command. For example:

totalview -e `puts hello'

The -s option tells TotalView that it should execute the CLI commands contained within a file. This is really handy when you have more than one .tvdrc file or when you need to debug a program in different environments.

You can use more than one -e and -s option on one command line. For example:

tvmain -e '"set foo 3"' -e 'puts "The value of foo is $foo" `

Startup (Part 3): Preferences


In most cases, the reason you put information in a tvdrc file is because you want something to be set all the time. In contrast, you use command- line options to do something just for the current session.

Preferences serve a third function: they indicate things that you will want to change either within a session or as needed in a convenient way. Here's the kind of information that can be set using preferences:
  • Action points: lets you set the scope of what is stopped, if the action point is set in a share group, automatic saving and loading, and if a Process Window is opened when a breakpoint is hit.
  • Launch strings: these are best set in a tvdrc file.
  • Bulk launch: these are best set in a tvdrc file.
  • Dynamic libraries: asks if TotalView should stop when loading dynamic libraries. This lets you set breakpoints within them.
  • Parallel: enables use of dbfork and what actions to take when your job goes parallel.
  • Fonts: lets you indicate which fonts TotalView should use when displaying information.
  • Formatting: sets the precision at which TotalView displays numbers and strings.
  • Pointer Dive: controls how pointers are dereferenced and how pointers to arrays are cast.
In most cases, the defaults are probably OK for your needs. The most often used preference are Fonts and Formatting. For detailed information on preferences, see File Preferences.

How do I manage my search path so TotalView finds my source files?


Most of the time, you do not need to worry about your search path since your compiler places information about a file's location within the debugging information that TotalView uses. If, however, you move your executable or if you are using a compiler that doesn't do a good job, TotalView may not find some of your source code unless you set a search path.

Here are three ways to set it:

  1. If you always debug from somewhere inside your source tree, set the path in a .tvdrc file in the directory where you start TotalView. (This tip is actually part 4 of the startup tips.)

    Here's an example:

    dset EXECUTABLE_PATH "./src1:./src2:./src2/subdir1"
  2. If you have an environment variable that points to the source directory, you can use it within a Tcl statement that sets the EXECUTABLE_PATH variable. For example:

    dset EXECUTABLE_PATH \ "$DEVEL_ROOT/src1:$DEVEL_ROOT/src2:$DEVEL_ROOT/src2/subdir1"

    Totalview will expand the DEVEL_ROOT environment variable to define a search path.
  3. You can use Tcl to expand things for you. The following example does the same thing as the second example:

    set root_dir $env(DEVELOP_ROOT)

    set module1_subdirs { ./src1 ./src2 ./src2/subdir1 }
    regsub -all { +\.} $module1_subdirs ":$root_dir" \
    tv_subdirs1

    set module2_subdirs { ./src3 ./src4 ./src5/subdir1}
    regsub -all { +\.} $module2_subdirs ":$root_dir/module2" \
    tv_subdirs2

    dset EXECUTABLE_PATH ".$tv_subdirs1$tv_subdirs2"

How do I DYNAMICALLY manage my search path so TotalView finds my source files?


You can use the EXECUTABLE_PATH variable to a predefined search path. Here's a Tcl proc that will do a better job:

# Usage:
#
# rpath [root] [filter]
#
# If root is not specified, start at the current directory. filter is a
# a regular expression that removes unwanted entries. If it is not
# specified, the macro automatically filters out CVS/RCS/SCCS
# directories. The TotalView search path is set to the result.

proc rpath {{root "."} {filter "/(CVS|RCS|SCCS)(/|$)"}} {

# Invoke the UNIX find command to recursively obtain
# a list of all directory names below "root".
set find [split [exec find $root -type d -print] \n]

set npath ""

# Filter out unwanted directories.
foreach path $find {
if {! [regexp $filter $path]} {
append npath ":"
append npath $path
}
}

# Tell TotalView to use it.
dset EXECUTABLE_PATH $npath
}


Notice that the CLI is executing the shell's find command to traverse your computer's file system. The found directories are then appended to a variable.

Notice that the statement setting the EXECUTABLE_PATH variable is the only statement that is unique to the CLI.

How do I change command line arguments and variables?


The focus of the last five tips has been on configuring TotalView. Now that TotalView is up and running, you may need to alter or add to the command line arguments and environment variables that were passed to TotalView from the shell. This is what the Process > Startup Parameters command is for.

The tab shows what arguments were passed in. You can add arguments or edit what is here. The second tab allows you to add new environment variables. Just type the variable and its value in the text field. The format is name=value.

Important: The changes you make are ignored until you restart your program.

Are you having problems reading some symbols in system libraries?


When TotalView reads in symbols for a shared library, it may only read in loader symbols if the library is contained within a directory such as /usr/lib. You can tell TotalView to do more by selecting the Dynamic Libraries Page from within the File > Preferences dialog box.

A shared library contains, among other things, loader and debugging symbols. Typically, loader symbols are read quite quickly. TotalView doesn't read other symbols because they can require considerable processing.

By naming the library in this preferences dialog box, you can change TotalView's behavior. Note that if the library is already read in, you'll need to restart TotalView making a change here.

How do I get TotalView to start faster?


If your program uses many libraries, TotalView might take a long time to get started. This is because TotalView is reading a lot of debug information. You can change the way TotalView reads in library by selecting the Dynamic Libraries Tab within the File > Preferences Dialog Box.

Here's what you should do:
  • Select no symbols, and then type *
  • If you are interested in some libraries, select all symbols, and then type a glob-style wildcard for the libraries you're interested in. For example, you could type:
    *libgraph*
If you need to interactively load a libraries symbols, you need to use the CLI. Here's what you should do:
  1. Open the CLI by selecting the Tools > Command Line command.
  2. Type the following command:
    TV::read_symbols -lib libname.so

How do I stop TotalView from timing out while attaching to processes on other computers?


TotalView automatically launches the tvdsvr process, it waits 30 seconds for it to respond. If the connection isn't made in this time, TotalView times out. You can change the length of time TotalView waits by changing a setting in the Launch Strings Page of the File > Preferences Dialog Box:

You can enter a value from 1 to 3600 seconds (1 hour) in the Timeout field.

If you are using bulk launch, there are similar controls in the Bulk Launch Page.

If you notice that TotalView fails to launch tvdsvr-information is displayed in the xterm window from which you started TotalView-before the timeout expires, click Yes in the question Dialog Box that appears.

Using the File > New Program command

you select the File > New Program command from either the Root or Process Window, TotalView displays its File > New Program dialog box:

(If you type totalview from a shell prompt, TotalView automatically displays this dialog box.) While this may look a lot simpler than the dialog box it replaces, there's now a lot more power. Here's an overview:
  • The pulldown list on the left in this figure is displaying Start a new process. Other choices are: Attach to a new process and Open a core file.
  • The on host pulldown list shows the systems you're attached to and also allows you to add additional hosts. In addition to being able to start jobs on these systems, the list displayed when you select Attach to a new process shows the processes running on that system. (While you could do this in older versions, the way you did it was very obscure.)
  • We now remember programs you've debugged. Select one of these programs from the Program pulldown list. We also remember some of the attributes set for that program, so you don't have to do as much work getting started as you did in previous versions.
  • The Arguments and Standard I/O tabs let you specify the same information that you can still specify using the Process > Startup Parameters command. It's duplicated here as a convenience.
  • The new Parallel tab makes it easier to start MPI and poe programs.


Configuring TotalView for starting MPI programs


The File > New Program command now has a Parallel tab.

The Parallel system pulldown list names default configurations. If the parallel system is located in a directory named in your PATH environment variable, starting your program should be straight-forward. However, if your system isn't on this list or there is something unique in your environment, you will need to create your own environment. creating an environment, it will appear as a choice on this list.

Loading multiple core files


When a multi-process program dumps core, it usually dumps more than one core file. Beginning at version 8.3, you can load all of these core files simultaneously using a wildcard. For example:

totalview my_program core*

If you select more than one file in the Attach to process area of the File > New Program dialog box, TotalView attaches to all of the selected files. These files are placed in the same control group.

How do I debug executables linked with shared libraries?


This tip is just an overview. The shared library information in the "TotalView Reference Guide" contains the full story, which is only 2 1/2 pages long.

When you start TotalView with a dynamically linked executable, TotalView:
  1. Runs a sample process and then discards it.
  2. Reads information from the process.
  3. Reads the symbol table for each library.

When you attach to a process using shared libraries:
  • TotalView loads the shared library's dynamic symbols if you attached to the process the dynamic loader ran.
  • TotalView allows the process to run the dynamic loader to completion if you attached to the process before TotalView runs the dynamic loader. It then loads the dynamic symbols for the shared library.
If you are executing a dynamically linked program, information isn't loaded until it is needed. This happens because the linker places code into your program that will do the loading. On most operating systems, you can get the symbols loaded earlier by setting the LD_BIND_NOW environment variable. For example:

setenv LD_BIND_NOW 1


How do I get back to where I was before I began diving?


If you dive on a function name, TotalView displays the function in the Source Pane. you've looked at what you want to look at, the "undive" button lets you return to where you came from. You can find this button immediately above and to the right of the Source Pane.

you dive on a function, TotalView adds a chevron (">") to the left of the function name. For example, the four chevrons in this figure show that I've dived four times. So, pressing the "undive" button four times gets me back to where I started.

The Variable Window also has an undive button that does the same thing for variable information.

How do I use the Stack Frame pane?


The Stack Frame Pane shows all of the selected function's parameters, local variables, and registers.

The Stack Frame Pane does not show global variables. If you want to see them, use the Tools > Program Variables command.

You can change a variable or register's value if TotalView is displaying it in bold.

You can show information about other functions on the stack by clicking on the function's name in the Stack Trace Pane.

If you want to see more information for a variable, dive (double-click) on it. TotalView responds by displaying a Variable Window containing this information. Diving lets you see all of an array's values, the value pointed to by a pointer, and the elements of a struct.

How do I know what state my processes and threads are in?


The Status column in the Root Window has this information.

The following table explains all the codes that can be used in this and in other windows:


State Code
Explanation

blank

Exited or never created

B

At breakpoint

E

Error

H

Process held

h

Thread held

I

Idle unattached process

K

In kernel

M

Mixed

R

Running

S

Sleeping unattached process

T

Stopped

W

At watchpoint

Z

Zombie unattached process

The table in the Process Window's Threads Pane also displays some of these codes. And, you'll find state information just above the Stack Frame and Stack Trace Panes.

How do I use the Action Points tab?


The Action Points Tab shows all of your program's action points.

Column 1
The kind of action point

Column 2
TotalView's internally generated action point ID

Column 3
The file containing the source line.

Column 4
The action point's line number and function

If you left click on an enabled action point, you disable it-or vice versa. If your right click on an action point, TotalView displays this context menu. If you dive (double-click) on a line, TotalView scrolls the Source Pane so that it is displaying the line at which the action point was defined. This means that you can use breakpoints to quickly get you to places in your code. In other words, create disabled breakpoints at places you'll think you'll need to go. Diving on these "navigational" action points quickly takes you to where you want to be.

Before adding a lot of action points, make sure that you've set the TotalView preference that tells TotalView to automatically save and load action points.

What's the best way to capture data values?


Is your monitor covered with yellow stickies because you want to remember data values? That is, your program is in a loop and you want to see the value of a variable the last time through the loop or perhaps even its original value. Here are some techniques for cleaning up your monitor:

  1. Use a screen capture program. In other words, take a picture and then display it when you need to.
  2. Cut and paste into a text editor window. On some systems, you'll have to use TotalView's copy command to capture the information.
  3. Use the Tools > Command Line command to open a CLI window. Then, use the dprint command to see values. For example:

    dprint {my_array[3:6]}

    Notice the braces surrounding the variable. They are needed in C and C++ because brackets have special meaning in Tcl. If you use this technique, you'll probably want to tell TotalView that it shouldn't display much other information. Here's the CLI command for doing this:

    dset VERBOSE ERROR
  4. Use an eval point containing a printf() statement.

However, using yellow stickies can be an effective debugging technique.

How do I get around focus grabs?


Some applications may grab the mouse or keyboard. When this happens, you can't use TotalView (or any other application). Here are three solutions:
  • Using your mouse and your window manager, iconify the windows of the application that grabbed the keyboard.
  • Use VNC to set up another logical display for TotalView or your application-it doesn't matter which-so that the grabs don't affect TotalView.
  • Use a rootless PC X-server like Exceed in native window-management mode. This runs the window manager within Microsoft Windows instead of X.

How do I "go to a line"?


When your compiler creates error messages, it usually displays a line number. While TotalView displays line numbers, there is no "Go to line number" command. Here's a hack:

Use the Edit > Find command and use the line number as the search string.

For example, suppose you're looking for line 100. Enter this number in the dialog box and then press the Find button. Before you do this, make sure you've selected something, anything in the Source Pane. This tells TotalView where to look.

This procedure does have problems. In this case, TotalView finds the number 100 anywhere in your program. It could be a constant in your program, or it could be line 1001 or 1100. You can make the search string a bit more precise by adding a space or two the number.

(Adding a "Go to line" command is on our "to do" list. Unfortunately, this list is long.)

There's another way--and it too is a hack.

If you invoke the Action Point > At Location Command or, better, use its accelerator, which is Ctrl+B, TotalView displays a dialog box into which you can type a line number. You can now double-click on the action point that was just created and you'll be at that line.

In many cases, you're going to the line because there's something interesting so you want a breakpoint there. However, if you don't want the breakpoint, just click on it in the line number area.

Process Groups


TotalView automatically places processes into groups. There are two kings of process groups-control and share. Here are quick definitions of these groups:

Control group
The executable that you invoked to start execution and all processes that this process started.

Share group
All processes within the control group that are executing the same file.

Much more precise definitions for these groups are contained within the "TotalView Users Guide."

The following diagram shows a quad-core chip, with each core shown as a white square. Notice the squiggly lines. These lines represent threads. This representation is indicating that every process is made up of threads. In this case, processes 1 through 3 each have 1 thread .

Here's what this diagram is telling you:
  • If we assume that the program running in core 1 was the primary executable, it spawned at least one additional process.
  • The process running on core 2 is identical to the process containing the executable on core 1. You can tell that they are identical because they are both in the same share group. While they are identical, this doesn't say anything about where the PC is located in either executable.
  • The executable running on core 3 was launched either by the executable running on core1 or core 2---it doesn't matter which. However, this executable is different from that being run on the other two cores. You know this because it is a different share group.
  • Nothing is executing on core 4.
Many of your programs will run on more than one chip. For example, you could have a blade server with multiple multi-core chips within it and your blade servers will be connected in a cluster. What do you think is the structure of control and share groups in this kind of environment? Actually, it really doesn't change.

The following illustration shows three quad-core chips.


The first thing to notice is that there is still just one control group and there are still just two share groups. The major difference is that the share groups contains more elements. In addition, I've added additional details:
  • Core 4 on each chip is now in a different color. This indicates that it is not executing anything. That is, these three cores are not in the share group.
  • The black lines between the quad-core chip is the communication channel between chips or between boards. What this channel is doesn't really matter.
  • The blue box in the left chip represents TotalView. The brown rectangles in the other two represent the TotalView Debugger Server (tvdsvr).
As processes are created on other chips TotalView launches a tvdsvr process on each. This is a light-weight process that sends information back to TotalView and interacts in minor ways with the processes running on your chip. Only one is needed for each chip as it can handle the interactions for all cores on a chip.

t times, you may want to create your own organizations.

selecting the Groups > Custom Groups command, here's what you do:
  1. Name the group.
  2. Select the processes that will be members of the group.
  3. Dismiss the dialog box. You are asked to confirm that the selection is correct.


In this example, a group named Quarters was created by selecting every fourth element.

TotalView will add the new group name to the Process/Thread selector. When you select Quarters, TotalView shows the process contained within Quarters in the Process tab.


If you are a CLI user, you can also create your own groups using the dgroups command. Here's an example:

dgroups -new p -g Quarters {4 8 12 16}

The newly created process group will appear in the Process/Thread (or scope) selector in the Process Window.

Other CLI commands for manipulating groups are:
  • dgroups -add
  • dgroups -delete
  • dgroups -intersect
  • dgroups -list
  • dgroups -remove
  • TV::group
CLI commands are documented in the TotalView Reference Guide.

Thread Groups


TotalView automatically places your program's threads into two groups:
  • Workers Group: All of your program's threads within the current control group that are executing. These threads can reside in more than one share group.
  • Lockstep Group: All threads that are at the same PC (program counter). A lockstep group only exists for stopped threads. By definition, all members of a lockstep group are within the same workers group. That is, a lockstep group cannot have members in more than one workers group or more than one control group. A lockstep group only exists when the threads are stopped.
While TotalView doesn't make the distinction, it is sometimes helpful to distinguish the threads that are performing the actual work and threads that are performing a service. For example, you may have a thread that performs queue management functions. When you create your own threads groups- this will be discussed in a later tip-you may wish to place these kind of threads into a separately named group than other threads. In this way, you can allow these "service" threads to execute as you are stepping other threads.

Your program may also have threads that are part of the operating environment for your program and which you shouldn't control. In the Root Window, these threads are identified with a negative thread ID.

The following illustration shows the relationship of share, workers, and lockstep groups. (The control group isn't show. Assume all of what you see is in the same control group.)


What do run controls run?


This tip discusses how TotalView understands what to run when you press a run command such as Go, Step, Out, and so on. For example, what does "stepping a group" actually mean? What happens to processes and threads that aren't in this group? TotalView uses three concepts to decide:
  • TOI-Thread of Interest, which is the thread in the current Process Window.
  • POI-Process of Interest, which is the process in which the current threads execute.
  • GOI-Group of Interest, which is the current group. This can be set using the Scope pulldown in the toolbar.
All three can also be set in the CLI.

These concepts let TotalView determine the scope of what it does when it executes a command.

Associated with these constructs is arena. The arena is the collection of processes, threads, and groups that are affected by a debugging command.

You will set the arena by choosing an item in the Scope pulldown list. In addition, you can set the arena using commands in the menubar. For example, there are eight next commands. The difference between them is the arena; that is, the difference between the next commands is the processes and threads that are the target of what the next command runs.

When TotalView executes a run command, figuring out the arena lets it decide what to do with other threads under its control. Depending on the command, TotalView determines the TOI, POI, or GOI, and then executes the command's action on that thread, process, or group. For example, suppose TotalView steps the current control group:
  • TotalView needs to know what the TOI is so that it can determine which threads are in the lockstep group-TotalView only lets you step a lockstep group.
  • The lockstep group is part of a share group.
  • This share group is also contained in a control group.
By knowing what the TOI is, the GUI also knows what the GOI is. Now, TotalView knows what it will step (the threads in the lockstep group). It also knows what it will allow all other threads in the GOI to run freely while it is stepping the threads in the lockstep group.


More on the TOI, POI, & GOI


The following illustration shows a control group that has five executing processes. Similarly, all threads in each process have five threads. The horizontal threads are manager threads; that is, they are created by the operating system to help manage your application. This means that they are not worker threads. Each of the processes has three vertical threads that actually do the work. However, one of these threads really doesn't contribute so it is drawn differently than the other two. (You can use the CLI's dworker command to tell TotalView that it isn't a worker thread.)


The thread with the red background is the TOI (Thread of Interest). This is the thread that is displayed in the Process Window. If you are using the CLI, the TOI is either the current focus or a temporary focus set using the dfocus command.

Because the TOI is known, TotalView knows that:
  • The POI is the process that contains the thread with the red background.
  • The GOI can be share group 2 or the control group. TotalView determines which one based on what is selected in the scope pulldown list.
  • If your selection isn't one of the Group elements, the GOI is irrelevant. If you've select a thread, the GOI and the POI are irrelevant.

The CLI and Groups


The CLI dgroups command can also create thread groups, which you can't do using the GUI. A previous tip showed using the dgroups command to add a new process group. Here's the command for creating a threads group.

dgroups -new t -g threads2 2.2 3.3 4

The only difference is the letter "t" instead of the letter "p". This example creates a new thread group named threads2 that contains threads 2.2, 3.3, and all the threads from process 4. While you can't create thread groups in the GUI, the name you've created will appear in the Scope pulldown in the GUI. When selected, TotalView will only act upon the threads in that group.

In general, you cannot add a process to a control group. If you need to do this, use the following command:

dset CGROUP($my_group) $CGROUP(1)
This adds the processes in control group 1 to the group named my_group. As an exercise, try to think how you can add all process in all control groups to my_group.

Consult chapter 2 of the TotalView Reference Guide for more information on the dgroups command.


Setting Focus in the CLI


In the GUI, you set the focus (or scope) of a command such as Step by selecting an entry in the pull down list on the left side of the toolbar. In the CLI, you set this by using the dfocus command. For example, here is the command that changes the focus to group 2:

dfocus g2
The letter "g" tells the CLI which processes and threads it will control. Here's the complete list of letters:

t
Thread
A command's target is the indicated thread.

p
Process
A command's target is the process that contains the TOI (Thread of Interest).

g
Group
A command's target is the group that contains the POI (Process of Interest). This indicates control and share groups.

a
All processes
A command's target is all threads in the GOI (Group of interest) that are in the POI.

You can also combine the width with another letter that indicates the kind of group. Here are the letters you can use:

C
Control group
All processes in the control group.

S
Share group
The set of processes in the control group that have the same executable as the arena's TOI.

W
Workers group
The set of all worker threads in the control group.

L
Lockstep group
A set that contains all threads in the share group that have the same PC as the arena's TOI. If you step these threads as a group, they proceed in lockstep.

You can combine these letters to further limit which threads the CLI will control. For example, "pS" tells the CLI that it should focus on all threads in the process that participate in the same share group as the TOI.

You can, of course, add a process number to what you type: Here's an example of adding a process number:

gW3

This tells the CLI that it will focus on all worker threads in the control group that contains process 3. The difference between this and pW3 is that pW3 restricts the focus to one of the processes in the control group.

This tip barely scratches the surface of how you specify command focus in the CLI. You will find complete information in Chapter 12 of the TotalView Users Guide.

TotalView comes up in mpirun (or poe or prun or whatever). How can I fix it?


If your program uses a starter program such as mpirun, starting your program means that TotalView will begin by showing the starter program's assembler code. Unless you're debugging the starter program, this isn't what you want. Here's a CLI program that tells TotalView to skip over this code. Add it to your .tvdrc file.

# Set your starter program to poe or mpirun or prun or whatever'
set starter_program poe
#
# Check if the newly loaded image is the starter program
# and start it immediately if it is.
#
proc auto_run_starter {loaded_id} {
global starter_program
set prog_name [TV::symbol get $loaded_id full_pathname]
set file_component [file tail $prog_name]

if {[string compare $file_component $starter_program] == 0} {
puts "Automatically starting $file_component"
dgo
}
}

# Append this function to Totalview's image load callbacks so that
# TotalView run this program automatically.
dlappend TV::image_load_callbacks auto_run_starter
You'll find information on starting parallel programs in Chapters 5 and 6 of the "TotalView Users Guide". Information on .tvdrc files is in Chapter 3. The CLI is described in the "TotalView Users Guide" and the "TotalView Reference Guide."


How do I test a signal handler?


The problem with a signal handler is that it's usually invoked when something happens outside of your program or when an error of some kind occurs. With Murphy's Law being what it is, these things never happen while your debugging your program. You can defeat Murphy if you use the Thread > Continuation Signal command to throw a signal at a thread.


What do all of those colors and patterns mean in Process, Variable, and other windows?
There's absolutely no specific meaning for any color or pattern. The colors are there to tell you if something is in the same process or thread. That is, if the colors are the same in different windows, they're related.

How do I get my program to start executing?


TotalView displays its Process Window, you're ready to get your program executing. Because almost everyone who uses TotalView has used another debugger, what you're probably doing is setting a breakpoint somewhere, then selecting the Go button in the toolbar.

Here are two alternatives:
  • Select the Process > Create command. This creates and initializes a process. Execution stops before your program begins.
  • Select Step or Next. This runs your program to the first executable line. That is, these commands initialize your program.

Bonus Tip: Diving in the Variable Window
Did you know that you can dive on the data in the Variable Window? The only time you can't dive is when you're seeing a fundamental data type such as an integer or floating point number.

I can't see my program's output because TotalView is writing so much stuff! What do I do?


Using the CLI, you can change the amount of information TotalView writes. For example, type the following in a CLI window:

dset VERBOSE ERROR

This tells TotalView that it should only display error messages.

If you always want this setting, place this statement in your .tvdrc startup file. (In most cases, this is really the setting you want.) See Chapter 3 of the "TotalView Users Guide" for information on startup files.

Why are there two hold commands on the Process menu?


The result of using either the Process > Hold or the Process > Hold Threads command is the same: TotalView holds all the threads in a process until they are released. And there's still another hold command: Thread > Hold. Notice that Process > Hold and Thread > Hold are toggle command and Process > Hold Threads isn't.

The reason there are two commands that do basically the same thing (and a third that is somewhat related) is that they are used in different circumstances. You would use the Process > Hold command when you want to control the entire process. In contrast, you would use the Hold Threads command when you want more selective control.

For example, suppose you have a process with 32 threads and you want to hold 30 of them. You would use the Hold Threads command to hold all of them and then go to the individual thread (perhaps using the T+ and T- controls or selecting them in the Root Window) and then release it using the Thread > Hold command.



Held/Release State
What Can Be Run Using Process > Go




This figure shows a process with three threads. Before you do anything, all threads within the process can be run.




Select the Process > Hold toggle. The button will be depressed. The blue indicates that you've hold the process. 

Nothing will run when you select Process > Go.




Go to the Threads menu. Notice that the button next to the Hold command isn't selected. This is because the thread hold state is independent from the process holdstate. 

Select it. The circle indicate that thread 1 is held. At this time, there are two different holds on thread 1. One is at process level, the other is at thread level. 

Nothing will run when you select Process > Go.




Go back to the Process menu and reselect the Hold command. 

you select Process > Go, the 2nd and 3rd threads run.




Select Process > Release Threads. This releases the hold placed on the first thread by the Thread > Hold command. 

you select Process > Go, all threads run.


How do I debug my OpenMP or MPICH or POE or UPC (or whatever) program?


The ways in which you compile and start TotalView differ greatly and are often unique to the environment existing at your site. Because there are so many issues specific to your compiler and runtime environment, you might want to capture that knowledge by customizing the HTML version of the TotalView documentation. This tip tells you how to do that.

The HTML version of our documentation has two dummy pages that you can customize. So, if you go either to the compiling or starting pages in the TotalView documentation, you'll find links to those pages. You can now add any information you like. So, you discover what works, you can share what you know here.

There are three chapters in the documentation that discuss compiling and running programs:
  • Chapter 8 of the Reference Guide has information on compiling your program and linking dbfork.
  • Chapters 5 and 6 of the Users Guide has lots of information on running parallel programs.

Loading very large libraries when TotalView starts takes a l-o-n-g time


When TotalView loads you program, it reads in shared libraries that your program uses. These libraries contain loader and debugging symbols that TotalView needs. If you have a number of large libraries, you'll have to wait until they all get read and stored.

You can speed things up by entering library names within the Dynamic Libraries Page with the File > Preferences Dialog Box.

As shown in the pulldown list, you can tell TotalView to load all symbols, load no symbols, or just load loader symbols. In most cases, what you'll want to do is tell TotalView to just load loader symbols.

Rather than name each library explicitly, you can use a wild card. For example:

/usr/app/mylibs/*

If you type this into the Load loader symbols list, TotalView will just read in loader symbols for any library contained within this subdirectory that your program uses.

TotalView may eventually read in debugging symbols for these libraries. If your program stops executing when the PC is within a library's code, TotalView will read its debugging symbols.

For more information, see "Controlling Which Symbols TotalView Reads" in the TotalView Users Guide.

How do I display what's stored in memory or at an address?


You can display areas of memory in hexadecimal and decimal. Do this by selecting the View > Lookup Variable command. TotalView displays a dialog box, you can enter:

  • An address
    If you enter a decimal or hexadecimal address, TotalView displays a Variable Window containing the word of data stored at that address.
  • A pair of addresses
    When you enter a pair of addresses, TotalView displays a Variable Window containing the data (in word increments) from the first to the last address. To enter a pair of addresses, enter the first address, a comma, and the last address.
All hexadecimal constants must have a "0x" prefix.
Here's an example:

Seeing the bits may not be what you want to see. You can change how the data is displayed by typing something into the Type field. For example, you could cast this to integer[8].

you locate the memory address, you may want to use the View > Examine Format command to view this information.

Returning to where you started


Debugging is a process. Debugging a program means examining a program's logic and its variables. As you are looking for a bug, you are looking at functions, searching for text, and diving on variables. In the Variable Window, you may be casting a variable's data type, diving on pointers or substructures.

At some time, you may need to get the Source Pane back to the way it was when you halted execution. The easiest way to get back to the PC is by using the View > Reset command. Actually, I don't know anyone who goes to the menubar and selects this command. Instead, they use its accelerator, which is Ctrl+R.

The Variable Window's Edit > Reset Defaults command removes many of the changes you make to the contents of a variable window.


Why don't my print statements print when I step over them?


The output of most print statements is buffered. This means that you won't see what you are printing until the buffer is flushed. If you don't want to wait for this to happen, you can place a call to fflush() in an eval point on the line the print statement.

If you need to do this for a lot of print statements, you might want to use the CLI:

d1.<> set lines {1112 1623 185 2362}
1112 1113 1114 1115
d1.<> foreach line $lines {
dbreak $line -e {fflush();}
}
d1.<>

There are too many threads running. What are they?


When you are running a multi-threaded program, you see lots of threads. Some were created by your program; others weren't.

For example, your program will not have created the threads having a negative thread ID (TID). What's going on is that the operating system is creating additional threads to do some work for you. These threads are called manager threads.

In most cases, TotalView can identify these manager threads and assigns a negative TID value to them so you know they aren't part of your program, and, in almost all cases, you can ignore them. If you don't want TotalView to show them to you, select the View > Display Manager Threads toggle command.

See Chapter 12 of the "TotalView Users Guide" for more information.

How do I see different processes at the same time?


If you are debugging a multiprocess program, you may want to have more than one Process Window open. Each window would have its own process. You can do this in two ways:
  • Select the Process Window's Window > Duplicate command. the new Process Window appears, select a process in the Processes to change to another process.
  • In the Root Window, place your mouse over one of the processes or threads, right click, then select Dive in New Window.


Stepping and assembler


If your are displaying source rather than assembler in the Process Window's Source Pane, you can use a Step Instruction command to move directly from executing a source statement to executing an assembler instruction. This is often useful when stepping into library routines.

For example, suppose the PC is pointing to a function named foo(). Selecting Step Instruction executes the first assembler instruction within the code that sets up the jump to foo().

If you are displaying source and assembler, you need to use the instruction stepping commands to tell TotalView what to do. If, however, you are just displaying assembler, the Next and Step commands perform the same operation as Next Instruction and Step Instruction.

The only time selection matters when you're displaying source and assembler is when you select a line, then select the Run To command. For example, selecting a line in the assembler area runs the PC to the the selected assembler line.

What's an action point?


You probably know what a breakpoint is from using other debuggers. TotalView has extended the notion of what a breakpoint is and we call the different kinds of things you can set "action points."

  • Breakpoints
    When a process or thread encounters a breakpoint, TotalView stops execution just before it will execute the line. A breakpoint is the simplest kind of action point.
  • Barrier breakpoints
    Barrier breakpoints are similar to standard breakpoints in that they also stop a process or a thread.They differ in that other processes or threads keep running until they too reach the barrier. In this way, the barrier lets you synchronize a group of related processes or threads.
  • Evaluation points
    An evaluation point is a breakpoint that has a code that you create associated with the breakpoint. Other debuggers sometimes calls these things actions. When your program hits an evaluation point, it executes the evaluation point's code.

    You can use an evaluation point, for example, to tell TotalView to stop execution when some condition you define is true or to call a function or to patch your program or to stop execution when a loop executes the number of iterations you indicate.
  • Watchpoints
    A watchpoint tells TotalView to do something when a variable's value changes. TotalView can either stop executing or evaluate some code that you associate with the watchpoint.
For more information, see Chapter 15 of the "TotalView Users Guide."

Is there a quick way to set a breakpoint on a function?


Use the Action Point > At Location command.

typing a function name, TotalView sets a breakpoint on the first executable statement within the function.

Bonus Tip: Shift-Clicking Between Barrier Points and Breakpoints
You can change a breakpoint into a barrier point by shift-clicking on the stop icon. Shift-clicking shifts it back to a breakpoint.

What's the difference between TotalView's stepping commands?


In the following figure, the PC is at line 15. The arrows in the figure indicate where the PC will be you invoke a stepping command.


Here's what the four stepping commands do:
  • Next executes line 15. On completion, the PC will be at line 16.
  • Step moves into the sub2() function. The PC will be at line 21.
  • Run To executes all lines until the PC reaches the selected line, which is line 23.
  • Out executes all statements within sub1() and exits from the function. The PC will be at line 9. If you now execute a Step command, TotalView steps into sub3().
What happens when you use the Out command is not quite as straightforward as it might appear to be. This will be discussed in next week's tip.

Where does the `Out' command stop?


In the following example, I'll run the program until line 9:
j = sub1(i); k = sub3(j)

and then step into sub1(). In the following illustration, you can see where the PC is. Also, the Expression List Window shows the j and k variables. The values you are seeing are just garbage.

selecting the Out button, here's what you'll see:

Notice that the value of j has not yet changed! If you were to look at the assembly code, you'd see that while TotalView has returned from the subroutine, it has not yet reached the assignment statement.

selecting the Step command, here's what you'll see:

Variable j has finally been updated. Here's what you'll see selecting the Out button to exit from sub3():

In the same way that j was not assigned a new value executing an Out button, variable k is also not assigned a new value.

Finally, selecting the Step button, everything is updated.


Stepping into routines that are parameters of routines


A previous tip used a step command to step into a routine and an out command to return from it. Here's a line from that program:

j = sub1(i); k = sub3(j);

What if your line of code were:

i = sub1(sub2(j),sub3(k));

Suppose execution is stopped before this line executes. Which subroutine will a step command step into? When I compiled a program having this line using gcc on a Linux x86 system, a step command went into sub2(). However, when I compiled the program using IBM's Visual Age on an RS/6000, a step command went into sub3().

Clearly, the order in which the parameters get executed is compiler-dependent.

Here's another example. Suppose this were your code:

k = 0;
i = sub1(++k, ++k);

What gets sent to sub1()? Is it:

i = sub1(1,2);

or

i = sub1(2,1);

In this example, it is clear that a problem exists. However, what if sub2() and sub3() manipulated the same global variable. If they did, a very subtle bug could be introduced. You'd be better off doing something like:

m = sub2(j); n = sub3(k);
i = sub1(m, n);

What's the difference between a breakpoint and a barrier breakpoint?


While both are action points, their roles are very different. In general, the reason you set a breakpoint is to examine your program's state just before the breakpointed line executes or to begin tracing how it will execute. In contrast, the reason you set a barrier point is to synchronize the execution of your program's processes and threads.

To make things simpler, I'll just talk about process barriers. Thread barriers work similarly.

When you set a barrier, you are setting it in each process in a group of processes. When any of these processes hits the barrier, it stops executing. When a second executing process reaches the barrier, it also stops executing. Eventually, TotalView will have stopped all of these processes at the barrier's location.

Because TotalView is not allowing your processes to execute past the barrier, it is synchronizing these processes to the barrier's location. This means that you can allow your processes to run freely at other times because you can always gather them together when you need to execute them synchronously.

Why can't I get a process (or thread) held at a barrier to execute?


Creating a barrier point tells TotalView that it should hold a process when it reaches the barrier. Other processes that can reach the barrier but aren't yet at it continue executing. One-by-one, processes reach the barrier and, when they do, TotalView holds them.

When a process is held, it ignores commands that tell it to execute. This means, for example, that you can't tell it to go or to step. If, for some reason, you want the process to execute, you can manually release it using either the Group > Release or Process > Release Threads command.

When all processes that share a barrier reach it, TotalView changes their state from held to released, which means that they will no longer ignore a command that tells it to begin executing.

The following figure shows seven processes that are sharing the same barrier. (Processes that aren't affected by the barrier aren't shown.)
  • First Block: All seven are running freely.
  • Second Block: One process hits the barrier and is held. Six are executing.
  • Third Block: Five of the processes have now hit the barrier and are being held. Two are executing.
  • Fourth Block: All have hit the barrier. Because TotalView isn't waiting for anything else to reach the barrier, it changes the processes' state to released. Although they are released, none are executing.


Why should I use barrier points instead of breakpoints when debugging multi-process and multi-threaded programs?


Because threads and processes can be doing different things, keeping things together is difficult. The best strategy is to define places where the program can run freely and places where you need to keep things under control. This is where barrier points come in.

The same things are true for multi-threaded programs.

Why breakpoints don't work (part 1)
If you set a breakpoint that stops all processes when it is hit and you let your processes run using Group > Go, you can get lucky and all of them will be at the breakpoint. What's more likely is that some processes won't have reached the breakpoint and TotalView will have stopped them wherever they happen to be. To get things synchronized, you'll need to find out which ones didn't get there and then individually get them to the breakpoint using Process > Go. You can't use Group > Go as this will also run the processes stopped at the breakpoint.

Why breakpoints don't work (part 2)
If you set the breakpoint's property so that only the process hitting the breakpoint will stop, you'll have a better chance of getting them there. However, you'd better not have other breakpoints between where the program is currently at and this breakpoint. If processes hit them, you are once again left running individual processes to the breakpoint.

Why single stepping doesn't work
Single stepping is just too tedious if you have a long way to go to get to your synchronization point and stepping just won't work if your processes don't execute exactly the same code (what happens when you hit an "if" statement?).

Barrier points work!
If you use a barrier point, you can set and hit Group > Go as many times as it takes to get all of your processes to the barrier and you won't have to worry about a process running past the barrier.

The Root Window shows you which processes have hit the barrier. It marks all held processes with an "H" in the column immediately to the right of the state codes. When all have reached the barrier, TotalView removes all "H"s.

How do the `When Hit, Stop' options change barrier and breakpoint behavior?


The answer to this requires an answer to another question first: What's the difference between a barrier and a breakpoint? Answer: A barrier has a hold/release state. For more information, see this previous tip.

Here's a quick summary of the possible behaviors:


Stop Group
Stop Process or Thread

Breakpoint

When anything hits, everything in the control group stops.

When something hits, this stops and other things keep running.

Barrier point

When anything hits, everything in the control group stops and the thing hitting gets held.

When something hits, this stops and other things keep running and the thing hitting gets held.
The most often used combinations are:
  • Breakpoint stopping process or thread. All things should eventually reach the breakpoint, which means that everything will be synchronized to the same place. As a process or thread hits the breakpoint, it stops. You don't have to do anything to get everything synchronized.
  • Barrier points stopping group. It always seems to happen that something goes off in a different direction so that it won't hit the breakpoint. Stopping everything let you find out where things are and what they're doing. You can then get get them executing again without executing the thing that hit the barrier. The reason: a barrier places a "hold" on things.
When you use a barrier, you can use a go command without worrying about the thing that hit the barrier doing anything. It'll just stay there, ignoring the execution command until everything reaches the barrier. In contrast, using a go command with a breakpoint also tells the thing hitting the breakpoint to execute, which means you'll never get things synchronized.

When you have a great number of processes and threads, knowing what they are doing and where they are doing it can become a major issue. This is where the Root Window comes in. (I've noticed that most people seem to ignore it.) The Root Window is really a control console that tells you, among other things, what is stopped and why it is stopped. (Another tip described what you'll see in this window.) So, when you tell your program to execute and something doesn't, the "H" in the Root Window's Attached Page tells you why.

If you combine this information with the information in the Root Window's Groups Page, it's pretty easy to keep track of what will happen when you tell your program to execute. So, if you change the "When Hit" options in the Action Point > Properties dialog box, this page shows you what else has stopped (or hasn't stopped)

Why do some of my processes get stuck on a barrier?


In almost all cases, you've placed a barrier on a line that is only conditionally executed. Here's a simple conditional:

if (i_am_cold) {
turn_on_heat();
} else {
open_windows():
}

If you place a barrier on the turn_on_heat() function, the processes that take the other branch of the if statement will never hit the barrier. Because the barrier will never be satisfied-"satisfied" means that all of the processes have hit the barrier-TotalView will never release the processes held at turn_on_heat().

Placing a barrier at both function calls doesn't help. Eventually, TotalView will be holding some processes at one place and the rest at the other. While everything is held, neither barrier is satisfied. This is because each needs the processes at the other barrier to reach it before TotalView will release that barrier's held processes. This means you can click on step and go commands until your mouse breaks and nothing will execute.

To get around this problem, create process-level breakpoints.

The moral of this tip is that you should only plant barriers in sections of code that are executed by all processes in the satisfaction set.

How do I know what triggered an exception?


If you have a C++ application that uses exceptions, you may need to see which statement was executing when the exception was thrown. Unfortunately, if you stop your process while it is inside an exception handler, you can't determine where the throw occurred.

While setting a breakpoint inside the exception seems like it would get you what you want, it's the wrong place. Instead, you want to set your breakpoint on the throw routine in your runtime library. The exact name of the function varies from platform to platform. On Linux, its name is _throw. (If you use the View > Lookup Function command, TotalView will usually find the right routine.)

Now when an exception occurs, the process stops when it enters the throw routine and, more importantly, the call stack still exists because your program will not yet have unrolled it. You can now click on the routines listed in the Stack Trace Pane to see where the program was executing when the exception was thrown.

One tips reader wrote in about a problem setting a breakpoint in a throw. This tip is an edited version of the email reply sent back to him by our support organization.

Tips Writer:
What do you do when you want to set a breakpoint on a function and TotalView tells you that the function you've named is ambiguous?

Tips Reader: I'm trying (and failing) to set a breakpoint in _throw on Linux. As this suggestion comes from your Tips' database, this surprised me.
Also, TotalView reports that _throw is an ambiguous function when I use the View > Lookup Function command.

Support:
That is because TotalView finds two symbols. If you click Show full path names, you can distinguish between them. Choose the one in the C++ runtime library.

Alternatively, you can pick and choose which exceptions you want caught. For example, if you are only interested in exceptions thrown from routines defined in your program, set the breakpoint there. In you do this, TotalView does not stop your program if a shared library throws an exception.

Tips Writer: What do you do when TotalView cannot even find the function?

Tips Reader: On Solaris under dbx, I can set a breakpoint on ex_throw that is hit when an exception is thrown. However, TotalView cannot find this function.

Support: Are you unable to find it or is it ambiguous? If it is ambiguous, choose the function in the runtime library. Another reason that TotalView may not find the function is that TotalView does not demangle loader symbols.

The trick to finding the demangled name is to look in the runtime library:

% nm /usr/lib/libCrun.so.1 | grep ex_throw
[192] | 19112| 68|FUNC |GLOB |0 |12
|__1cG__CrunIex_throw6Fpvpkn0AQstatic_type_info_pF1_v_v_
[301] | 19736| 512|FUNC |GLOB |0 |12
|__1cG__CrunVex_throw_with_context6Fpvpkn0AQstatic_type_info_pi_v_
[84] | 19180| 540|FUNC |LOCL |0 |12 |_ex_throw_body

So, all you need to do is to set a breakpoint on the following function:

__1cG__CrunIex_throw6Fpvpkn0AQstatic_type_info_pF1_v_v_

This isn't easy to remember so on Sun, you can set an alias:

alias tvcommand "totalview -e 'dbreak \ 
__1cG__CrunIex_throw6Fpvpkn0AQstatic_type_info_pF1_v_v_'"

Now, just type tvcommand to set the breakpoint.

Finally, if you are using exception classes, you can place breakpoints in your exception constructors.

I've found the bug, how do I quickly test my fix?


Now that you've found what you think the problem is, you need to, you think, edit your program's source, recompile your program, and then restart TotalView to check that what you think works actually does work.

While you will eventually have to do this, there's a quick way to patch your program: just put your patch into an eval point and add a goto statement to branch around the bad code.

For example, here's what to do if lines 1100 through 1110 need changing:
  • Add an eval point at line 1100. TotalView processes eval points before the line on which they're set.
  • Type your code.
  • If your debugging a C program, make the last statement goto 1111. If it's a Fortran program, type goto $1111.

Exiting from TotalView and restarting I recompile takes a long time


So you've found your bug, changed some source code, and recompiled your program. What you've always done is exit from your debugger and then use a shell command to restart it. Unfortunately, when you do this, you destroy the continuity of your debugging session by taking down all the window's you've opened. What's the solution?

People exit from TotalView because that's what other debuggers force them to do. TotalView doesn't. If you recompile your program and then restart it, TotalView automatically detects what you've done and refreshes its information.

How do I tell my program to stop at a breakpoint every 100 times a loop executes rather than every time?


Create an eval point, then use the $count function. Here's what you'd enter to get a loop to stop 100 iterations:

$count 100
TotalView's internal counter is a "static" variable, which means that TotalView remembers every time it executes the eval point. Suppose you create this eval point within a loop that executes 120 times and that the loop is within a subroutine. As expected, TotalView stops execution the 100th time the eval point executes. When you resume execution, the remaining 20 iterations occur.

The next time the subroutine executes, TotalView stops execution 80 iterations because it will have counted the 20 iterations from the last time the subroutine executed.

This isn't a bug that we're documenting as a feature. Suppose you have a function that is called from lots of different places from within your program. Because TotalView remembers every time a statement executes, you could, for example, stop execution every 100 times the function is called. In other words, while $count is most often used within loops, it can be used outside of them as well.


I like debugging using printf statements. Can you help me?


While TotalView does a really good job showing things, we understand that once in a great while-not too often, of course-you may want to use a printf() command to see something.

TotalView gives you an easy way to do this if you program using C or C++. Wherever you want to add one, insert an evaluation point and type a printf() statement. Now, whenever the evaluation point is hit, TotalView executes your code. Here's an example:

this gets printed, your program continues executing.

How do I use breakpoints that someone else saved?


If someone else is debugging the same program that you're debugging and each of you is debugging your own version, you can use the other person's breakpoints if you use the CLI. While the GUI always saves your breakpoints into the same directory as your program and doesn't give you any choice what name it uses for this file, you have more control over things in the CLI.

Here are the commands that read and write breakpoint files:
  • dactions -load pathname tells TotalView to load the breakpoint file named pathname.
  • dactions -save pathname tells TotalView to save the breakpoint file to the file pathname.
There's no default extension, so you must type the complete file or pathname. This also means that you can save your breakpoints into a file named foo.bar if you want to.

How do I sit back and watch TotalView step through my program?


There are times when you want to sit back and watch TotalView step through your program. For example, you like to step and see how variables are altered, then step again and see these variables, and then step again, and so on.

The following macros let you do this (sort of):
proc slowly {{what step} {interval 1}} {
global slow_command
set slow_command [list \
[expr $interval*1000] [list slow_body $what]]

eval $slow_command
}

proc slow_body {command} {
global slow_command

#
# If the step or next command fails we quit.
# The process probably SEGV'd or something equally bad.
#
if {[catch $command]} return

dwait

# Check if the process is in a state we don't want to leave.
set state [capture dstatus]
regsub -all { +} $state " " state
set state [lindex [split $state " "] 2]

if {[string compare $state "Breakpoint"] == 0} {
puts stdout "At breakpoint, stopping slow $command\n"
return
}
if {[string compare $state "Nonexistent"] == 0} {
puts stdout "Process exited, stopping slow $command\n"
return
}

if {$slow_command != ""} {
eval $slow_command
}
}

proc x {} {
global slow_command
set slow_command "" }

Notice the "x" proc. Because the macro is continually resubmitting itself, you need a way to tell slow_step to stop executing. The name "x" was chosen because it's easy to type. You may want to rename it to something that is more descriptive.

Here's how to use it:

  • loading a program and getting it set up the way you want it to be, open the CLI by using the Tools > Command Line command.
  • Use the source command to load this script. For example:
    source slow_step.tcl
  • Whenever you want to use it, type something like:

    slowly dstep 3

    This tells TotalView to step your program every 3 seconds. That is, the first argument is the command to execute and the second command is how often the CLI should execute the command.
  • To stop stepping, type "x". (This is the reason the command's name is s-o-o-o short.)
As this program executes, the GUI window updates, as do any Variable Windows that you may have open.

The code that you'll be shown hasn't been tested in multi- process and multi-threaded programs. It may break when repeatedly re-executing complicated multi-threaded code. But, it does an adequate job of just letting you sit back and see how your program is executing.

proc slowly
Tcl doesn't have a sleep command. However, like JavaScript, it has a scheduler, which means you can use the command to tell Tcl to do something later. Using a scheduler to perform an activity is much different than writing a loop and placing a sleep command within it. While other differences exist, the one that is important here is that each time a scheduler-based command executes, it must resubmit itself. Consequently, a mechanism is needed to continuously submit the command and another mechanism is needed to stop the command from resubmitting itself.

The slowly procedure creates a list containing the Tcl command. building the list, Tcl evaluates it. Note that this command executes, it is no longer used. Instead, the command invoked by the command, which is slow_body, contains the statement that will reevaluate the string containing the command.

proc x
The x routine simply sets the global list containing the command to the empty string. Just before evaluating the created list, the created list is checked to see if it is non-empty. If it isn't, it gets submitted. In this way, executing the x command stops the slow stepping action.

proc slow_body
  • The stepping command is performed within a catch so that the slow_body procedure will stop if a problem occurs.
  • dwait tells TotalView that it should wait for all other threads and processes that are involved in stepping to stop executing. In other words, it tries to keep everything together.
  • dstatus captures the current state into a variable.
  • The program then checks for a string within the variable. If you are at a breakpoint, execution terminates. If the process no longer exists, execution also terminates.
  • Depending upon how thorough you wanted to be, you could check for other things, but this is usually enough.
  • The slow_command variable is evaluated. This variable is where we previously stored the slow_body command and its arguments. Consequently, this statement is sending the command back to the scheduler so it can be executed at a later time.
The most interesting thing here is the use of the capture command. Tcl has two ways of displaying information. In some circumstances, it is returned. In others, it is printed. If it is returned, you can assign the value directly into a variable. If, however, it is displayed, you need to use the capture command to help you set the variable.

"capture" is a command that we've added to the Tcl environment. There are a few others. The commands that we've added that don't really drive TotalView do not begin with the letter "d", like dstep and dgo do.

How does a Watchpoint differ from a breakpoint?


A watchpoint looks a lot like a breakpoint. While both stop execution, the reasons they stop execution differ.

When you set a watchpoint, you are telling TotalView to watch memory locations. A watchpoint triggers (that is, performs an action) if the value within the memory location changes. In contrast, a breakpoint triggers just before a line of code executes. By default, the trigger action stops execution. However, you can modify the behavior so that TotalView evaluates an expression at that time.

Because a watchpoint is watching memory, it doesn't really know or care about variables. In other words, while the way in which you set a watchpoint seems to say "watch variable foo," the reality is that you are actually saying "watch the memory location in which the foo variable's value is stored." One consequence of this is that if you tell TotalView to watch a stack variable, it will be watching memory that is "owned" by different variables as your program executes.

Hardware watchpoints


A TotalView watchpoint is based on your hardware. That is, we rely on features provided to us by the hardware and, to some extent, the operating system. We use special registers and other features that differ from computer-to-computer.

Because our watchpoints are based on hardware, they don't add a lot of overhead.

In contrast, a software watchpoint requires that execution be stopped every instruction so that a debugger can check if a value changed. This approach can severely impact performance. If a program executes on one CPU, performance is ponderous. On a multithreaded or multiprocess job, performance becomes ludicrous (which is why TotalView doesn't do it this way).

The problem with hardware watchpoints is that there are some architectures where there aren't a lot of resources available, which means the number of watchpoints you can have are limited. The following table describes what TotalView can do:


Computer
Constraints

HP Alpha Tru64

Tru64 places no limitations on the number of watchpoints that you can create, and no alignment or size constraints. However, watchpoints can't overlap, and you can't create a watchpoint on an already write-protected page. 

Watchpoints use a page-protection scheme. Because the page size is 8,192 bytes, watchpoints can degrade performance if your program frequently writes to pages that contains watchpoints


IBM AIX

You can create one watchpoint on AIX 4.3.3.0-2 (AIX 4.3R) or later systems running 64-bit chips. These are Power3 and Power4 systems. (AIX 4.3R is available as APAR IY06844.) A watchpoint cannot be longer than 8 bytes, and you must align it within an 8-byte boundary. If your watchpoint is less than 8 bytes and it doesn't span an 8-byte boundary, TotalView figures out what to do.

IRIX6 MIPS

Watchpoints are implemented on IRIX 6.2 and later operating systems. These systems let you create approximately 100 watchpoints. There are no alignment or size constraints. However, watchpoints can't overlap.

Linux x86, 
Linux IA-64, 
Linux x86-64 (AMD and Intel)

You can create up to four watchpoints and each must be 1, 2, or 4 bytes in length, and a memory address must be aligned for the byte length. That is, you must align a 4-byte watchpoint on a 4-byte address boundary, and you must align 2-byte watchpoint on a 2-byte boundary, and so on.

HP-UX IA-64

You can create up to four watchpoints. The length of the memory being watched must be a power of 2 and the address must be aligned to that power of 2; that is, (address % length) == 0.

Solaris SPARC

TotalView supports watchpoints on Solaris 7 or later operating systems. These operating systems let you create hundreds of watchpoints, and there are no alignment or size constraints. However, watchpoints can't overlap.


How do I stop execution when a variable's data changes?


Use a watchpoint.
A watchpoint is like a breakpoint. While a breakpoint activates whenever a line in your program executes, a watchpoint activates whenever the contents of the memory locations associated with a variable changes.

You can create a watchpoint by using the Variable Window's Tools > Watchpoint command.

Because TotalView is watching a memory location, it doesn't matter what causes the change to occur. It could be the variable upon which you've set the watchpoint or it could be because you've done something using a pointer or it could be because an array overstepped its bounds. Any action that changes this memory stops your program.

Unlike a breakpoint, watchpoints stop your program a statement executes, not before-if it stopped execution before, it would be stopping before the value changes.

If you set a watchpoint on memory allocated on the stack, you may not get the results you expect. the routine using the stack frees this memory, it can be reused by another routine. In this case, changes to a watched memory location probably don't have much meaning.

How do I see what the value of a variable was before a watchpoint triggers?


A watchpoint triggers whenever the value stored in a memory location changes. Once in a while, you may need to see what this value was before your program changed it.

Assume that you have a global variable into which you can store the value. Here's what to do:
  1. Dive on the variable.
  2. Select Tools > Watchpoint to create a watchpoint on that variable's memory location.
  3. Select the Conditional radio button at the top of the dialog box.
  4. Type an expression. Here's an example:

    The $oldval variable is built in to TotalView. It contains the location's value before the change occurred. Now that this previous value is assigned to the OldValue global variable, you can dive on the global to see what the previous value was.
  5. pressing OK, you can run your program.
    In this example, when the previous value is greater than 100, the program stops. If you dive on the OldValue variable, you'll see what this previous value was.

This example used a condition to determine when TotalView should stop your program. You don't need to use one. You could have just typed $stop. If you had done that, TotalView would stop every time the value changes.

Why can you only use global variables in conditional watchpoints?


When creating a conditional watchpoint, all variables must either be global or be created within the conditional's expression.

This restriction does not exist when creating a conditional expression within an eval point. This is because TotalView executes eval points in the context of the line at which you had created it. Because local variables will be allocated within the stack, TotalView can use them.

In contrast, a watchpoint is associated with a memory location. The code that changes this memory location could be in the same stack frame or it could be in a different stack frame. If the original stack frame doesn't exist or isn't current when the watchpoint triggers, TotalView wouldn't be able to access the local variable. This is why this restriction exists.

Why do I need a watchpoint variable to access the current value?


Why do I need TotalView's $newval? variable? I can just dive to find out what the current value is.

As last week's tip discussed, TotalView isn't really watching a variable. Instead, it is watching a memory location. When the contents of this memory location changes, TotalView stops your program. When it does, you can, of course, dive to display the current value.

Sometimes, however, you only want to stop your program when something specific happens. For example, you might only want to stop execution if a value is less than zero. If a lot of statements are writing into a global data structure, it is inconvenient to set an evaluation point on every line that could make a change. This is where conditional watchpoints come in: watchpoints don't care which statement makes a change.

Since watchpoints cannot use variable names, you need another method to get at this value. This is the what $newval is for.

For example, here's what you might type to find out when the contents of a memory location becomes less than zero:

if ($oldval >= 0 && $newval < 0) $stop;

Here's what you would type if you want to stop execution when the sign of the value in the memory location changes:

if ($oldval*$newval <= 0) $stop;

How do I see which process is at a breakpoint?


Suppose you're running a multiprocess program and process 5 is in your Process Window. Suppose process 2 hits a breakpoint. How do you know this happened? Or worse, what effect could this have on process 5.

If the breakpoint is set at group level (which is the default), TotalView will also stop process 5 and you won't know why it is stopped unless you look at the Root Window and see that the process is stopped at a breakpoint.

If the breakpoint is set at process level, process 5 will continue executing while process 2 is stopped and you won't know that this has even happened unless you look here. If there are dependencies between processes, you could begin seeing other problems.

If you go to the Action Points Tab of the Preference's Dialog Box, you can change TotalView's behaviors.

As is shown, TotalView stops the group-this is actually the control group-when it hits a breakpoint. This may not be what you want. For example, you could also be stopping the process that controls your multiprocess behavior (for example, mpirun on some systems). Setting When breakpoint hit stop to Process tells TotalView that it should only stop the process hitting the breakpoint.

Selecting Open process window at breakpoint solves another problem. When, for example, process 2 hits that breakpoint, TotalView will open a Process Window showing you process 2 and, of course, you will see it stopped at the breakpoint.

How are breakpoints implemented?


You use breakpoints all the time. Do you know what's going on behind the scenes?

When you set a breakpoint, TotalView swaps the assembler instruction at that location with a "trap" instruction. When this instruction executes, the operating system raises a SIGTRAP signal and TotalView gains control.

When you resume execution, TotalView insures that the swapped out instruction is executed.

Because TotalView is out of the way while the program is executing, it isn't adding any overhead to slow your program down.

How do I avoid setting breakpoints that I only use once?


Often when debugging, you may be stepping through your code, but then you reach a loop that contains code that you're not interested in. To get over the loop, you could set a breakpoint at the end of the loop, run the process, and then delete the breakpoint when it is hit.

There's an easier way: click your left mouse button on a target line, and then click on the Run To button in the Toolbar. TotalView lets you process run until execution reaches this line.

If you are debugging an MPI program and you've set the Threads pulldown in the Toolbar to Group, all of your ranks run until they reach this line.

In the CLI, you can use the duntil command. Using the CLI has an advantage. Suppose there's a breakpoint between where the PC is and where you want to go. When execution reaches the breakpoint, execution stops and the context switches to the line having the breakpoint. This means you've lost the selected line. However, if you've used the duntil command, all you need do is retype it and everything continues to execute until they all reach this line.

Why doesn't TotalView change to the thread hitting a breakpoint?


TotalView can do this (and I'll tell you what to do in the next paragraph). We don't like to change your context because an event occurred. When your program hits a breakpoint in another thread, TotalView would have to change lots of displays. For example, a Variable Window would show the old thread, not the new thread. Same with the Expression List Window, and others. So, we decided that it would be better for you to do it explicitly rather than do it for you.

However, if you want TotalView to automatically change to this thread, you can set a preference. Invoke the File > Preferences Command and select the Action Points page.

Select Open process window at breakpoint and then press OK. TotalView will now focus on the thread hitting the breakpoint.

Creating a CLI conditional break macro



Sometimes, using the CLI can greatly simplify debugging activities. For example, setting an eval point that stops execution isn't difficult in the GUI, but you do have to remember the syntax. As an alternative, you can write a simple CLI macro that does this for you. For example,

proc conditional_break {loc cond} {
set expr "if ($cond) \$stop"
dbreak $loc -e $expr
}
alias cb conditional_break

So, for example, if you would like to create an eval point on line 47 that stops execution when the variable my_var equals 20, all you would now need to do is open a CLI window and type:

cb 47 {my_var==20}

If you placed this macro in your tvdrc startup file, it will be there waiting for you whenever you need it.

Automatically loading actionpoints when not in executable directory


When TotalView loads a breakpoint file, it assumes that the breakpoint file is in the same directory as the executable. You can modify this behavior within TotalView by using the Actionpoint > Load All command. However, this pathname you enter is not persistent; that is, it only lasts within the current debugging session.

If you cannot store breakpoint files in the same directory, you can write a CLI macro that will do this for you by appending a function onto the TV::image_load_callbacks list. TotalView invokes the commands within this list whenever it loads a program.

Here's an overview of what you need to do:
  1. Create a breakpoints file. This breakpoint file could be one created by TotalView from the breakpoints that exist when you use the Actionpoint > Save as command or it could be in a file containing dbreak commands.
  2. Create a Tcl macro that loads the breakpoints.
  3. Add this macro to this variable.
    Let's assume that you're using an actionpoints file saved by TotalView and that you're always saving this file in your .tvdrc directory.

    # Automatically load breakpoints contained within a file within my
    # home directory whenever an executable or shared library is loaded
    #
    proc load_breakpoints {image_id} {
    set filename [TV::symbol get $image_id base_name]
    set tvd_filename "/home/me/.tvdrc/$filename.TVD.v3breakpoints"
    if {[file readable $tvd_filename]} {
    # puts "Processing startup commands from $tvd_filename"
    uplevel #0 [list daction -load $tvd_filename]
    }
    }
    # Append load_breakpoints to TV::image_load_callbacks. It is called
    # whenever an executable or shared library is loaded.
    #
    dlappend TV::image_load_callbacks load_breakpoints

What is a Pending Breakpoint?


A pending breakpoint is a breakpoint that is not yet resolved. For example, suppose you want TotalView to stop execution when my_important_func() begins executing. However, the function resides in a shared library that is not yet loaded.

Begin by selecting Action Point > At Location, then typing the function name in the Named area:

TotalView responds by displaying the following question:

Press the Yes button.

If you now look in the Action Points tab, TotalView tells you that it has created a pending breakpoint.

When TotalView writes its breakpoint file, this information is also written into it, thus preserving the breakpoint from session to session.

If what you had wanted was to create a barrier point or an eval point, just right click on its entry in the Action Points tab and select Properties from the context menu. You can now alter its properties in exactly the same way as you can alter a non-pending breakpoint.

How Do You Set Breakpoints on All Methods in a Class?


and...
How do you set breakpoints on virtual functions and their overrides?

The Action Point > At Location dialog box has always let you set a breakpoint at a function name or at a line number. It also lets you set a breakpoint on:
  • All methods in a class
  • All virtual functions and their overrides
Here is the At Location dialog box:

When doing these actions, TotalView creates a set of breakpoints instead of just one breakpoint.

If you look at the Action Points tab, however, it appears that TotalView only created one breakpoint. You can see that it actually created multiple breakpoints by scrolling through the Source Pane. And, if you dive on a breakpoint in the Action Points tab, TotalView displays a dialog box asking you which of the breakpoints you want to go to.

The Variable window


You can do more things in a variable window than you might think.

You can edit anything displayed in bold, which includes:
  • A variable or element's value. Simple arithmetic to create a value is ok. For example, you could type: 37*44.
  • Its address. You can even do simple arithmetic. For example, you could enter: 0x2ff22320+220.
  • Its type, which means that you can cast a type into another data type.
If you are seeing array data, you can:
  • Slice it, which means only show part of the array.
  • Filter it, which means only show values that meet a condition that
  • Sort the data.
  • Visualize the data.
  • Obtain statistics.
  • Treat an element in an array of structures as if it were a single array element (dive in all)
Other things you can do are:
  • Show Across(see the information in each process or thread) the data
  • Chase pointers or see more information on an item by diving and return to where you just came from.

What's the simplest way to see a variable's value?


If you place the mouse cursor over a variable, TotalView displays a little window showing the variable's value:

In this figure, I placed the cursor over the total_threads variable. The yellow tooltip appeared, showing its value. By way of contrast, the figure also contains an Expression List Window with the same information.

If the variable isn't simple and the information can't be resolved into a one-line display, the tooltip will contain information about the variable. The information that you'll see is identical to that which TotalView displays in the Expression List Window.

When is a variable not a variable?


In TotalView, a variable can almost always be considered an expression. For example, suppose you want to see the currently indexed array element but you really don't know the index's value; that is, you want to see foo[i]. An "obvious" solution is to look up the value of i, then display the entire array and then look for that element. A second solution is to look up the value of i and then enter the array name and the value of i in View > Lookup Variable command to find something like foo[30]. A third is to place foo[i] in the Expression List window.

A fourth way is by using the View > Lookup Variable and type: (foo[i]) in C or (foo(i)) in Fortran. Notice the outer parentheses. selecting OK, the value of foo[i] appears in a Variable window. This expression will appear in the Expression field.

The value displayed in the Variable Window does not change as the value of i changes. This is because TotalView doesn't reevaluate the value of i.

How do I display all values of one member in an array of structures?


Suppose you have the following Fortran definition:

type embedded_array
real r
integer, pointer :: ia(:)
end type embedded_array

type(embedded_array) ea (3)

displaying ea in a Variable Window, select an r element, and then invoke the View > Dive In All command (which is also available when you right-click on a field). TotalView will respond by replacing the contents of the Variable Window with the three r elements of the ea array. These elements are treated as if they belong to a single array.

You can also use this command to unify the display of elements within a C array of structures as arrays. Here's a link to the page in the user guide that shows the results of diving on the a and the next elements in an array of structures.

How do I see global variables? You took away the Global Variables command!


You've got a bunch of choices:
  • If you can see a global variable, dive on it.
  • If you can't see it, look for it using the View > Lookup Variable command.
  • Use the Tools > Program Browser command to look at elements contained within your program.
The Program Browser command, introduced at version 6.0, replaces the Tools > Global Variables command. It does about the same thing-eventually. Unfortunately, the price of providing a lot more information is a bit more complexity.

The following figure shows how to use the Program Browser command:

selecting this command, TotalView displays a window something like the one shown in the upper left. It lists the programs and libraries that make up your executable. If you dive (click your middle mouse button) on one these names, it will show components that make up the program or library. If you dive a second time on a file, which is the window in the lower right, you'll see information about variable's in that file. To find more information, you'll need to dive again.

Unfortunately, previous versions of TotalView were very imprecise about scoping. This meant it showed you global variables that weren't really global as even global variables have scope. Now that TotalView gets it, you've got to know a little more about where your "global" variables are located before you can see them.

How do I change the precision and size at which data is displayed?


In most cases, TotalView does a reasonable job of displaying a variable's value. If TotalView's aren't what you want, you can change the way it displays simple data types if you by using the controls within the Formatting Page of the File > Preferences Dialog Box.

selecting one of the data types listed on the left, you can set how many character positions TotalView will use to display a value of that type (Min Width) and how may numbers it should display to the right of the decimal place (the Precision). You can also tell TotalView how it should align the value in the Min Width area and if it should pad numbers with zeros or spaces.

While the way in which these controls interrelate can be complex, the Preview area shows what the effect of your change is. you play with these controls for a minute or two, what each control does will become clear. When experimenting, you should set the Min Width value to a larger number than you need it to be to so that you can what happens when you make a change. For example, if the Min Width doesn't allow a number to be justified, it could appear that nothing is happening.

How do I look at my argv list?


Typically, argv is the second argument passed to main(), and it is either a char **argv or char *argv[ ]. Since these declarations are equivalent (a pointer to one or more pointers to characters), TotalView converts both to $string** (a pointer to one or more pointers to null-terminated strings).

Suppose argv points to an array of three pointers to character strings. Here is how you can edit its type to display an array of three pointers:
  1. Select the type string for argv.
  2. Edit the type string using the field editor commands. For this example, change it to:
    string*[3]*

  3. To display the array, dive on the value.


How do I `untransform' STL data?


At release 6.2, TotalView automatically transforms STL lists, maps, and vectors so that they are displayed logically. That is, you no longer view the data in the same way that the compiler sees it. If you need to see it in this way, open the File > Preferences Dialog Box. Within the Options Page, uncheck the View simplified STL containers (and user-defined transformations) item.


Using the TTF (Type Transformation Facility) to simplify structures?


The following small program contains a structure and the statements necessary to initialize it:

#include <stdio.h>

int main () {
struct stuff {
int month;
int day;
int year;
char * pName;
char * pStreet;
char CityState[30];
};

struct stuff info;
char my_name[] = "John Smith";
char my_street[] = "24 Prime Parkway, Suite 106";
char my_CityState[] = "Natick, MA 01760";

info.month = 6;
info.day = 20;
info.year = 2004;
info.pName = my_name;
info.pStreet = my_street;
strcpy(info.CityState, my_CityState);

printf("The year is %d\n", info.year);
}
Suppose that you do not want to see the month and day components. You can do this by creating a transformation that names just the elements you want to include:
::TV::TTF::RTF::build_struct_transform {
name {^struct stuff$}
members {
{ year { year } }
{ pName { * pName } }
{ pStreet { * pStreet } }
}
}

You can apply this transformation to your data in the following ways:
  • opening the program, use the Tools > Command Line command to open a CLI Window. Next, type this function call.
  • If you write the function call into a file, use the Tcl source command. If the name of the file is stuff.tvd, enter the following command into a CLI Window:

    source stuff.tvd
  • You can place the transformation source file into the same directory directory as the executable, giving it the same root name as the executable. If the executable file has the name stuff, TotalView will automatically execute all commands within a file named stuff.tvd when it loads your executable.
Here is how TotalView displays the transformed structure in a Variable Window:

For more information, see our Reference Guide.

How do I view a static class object declared inside a function?


TotalView treats a static class object as if it were a global variable. That is, you can access a static from any scope. Here is a program that contains a static object. The steps following the code show six of the ways (there are more) to access the object.

1   #include <iostream>
2
3 class ConnMgr {
4 public:
5 int getConnMgr(void);
6 private:
7 static int connMgr;
8 };
9
10 int ConnMgr::connMgr = 0;
11
12 int ConnMgr::getConnMgr(void) {
13 ConnMgr::connMgr++;
14 return connMgr;
15 }
16
17 using namespace std;
18
19 int main(int argc, char *argv[]) {
20
21 ConnMgr a, b;
22
23 for(int i=0; i<8; i++) {
24 cout << a.getConnMgr() << endl;
25 cout << b.getConnMgr() << endl;
26 }
27
28 return 0;
29 }

compiling this program, bring it up within TotalView. You're now ready to display the connMgr object.

Method 1
  • Set a breakpoint on line 14 and then click Go.
  • Dive by middle-clicking on connMgr on line 14.
Method 2
  • Close the Variable Window.
  • Highlight ConnMgr::connMgr on line 13, right-click to display a context menu, and then click on Dive.
Method 3
  • Close the Variable Window.
  • Delete the breakpoint on 14 by clicking on the stop symbol.
  • Set a breakpoint on line 25 and then click Go.
  • Display the View > Lookup Variable dialog box. It's easier to type the accelerator, which is "v".
  • Type ConnMgr::connMgr. (I don't need to tell you to select OK or press Enter.)
Method 4
  • Close the Variable Window.
  • Display the View > Lookup Variable dialog box, then type static.cxx#ConnMgr::connMgr. (static.cxx is the program's file name.)
Method 5
  • Close the Variable Window.
  • Display the View > Lookup Variable dialog box, then type ##vxchg.tsk#static.cxx#ConnMgr::connMgr.
Method 6
  • Close the Variable Window.
  • Select the Tools > Program Browser command. Dive on staticLinux. In the displayed window dive on static.cxx, and then dive on ConnMgr::connMgr.

Is there any easy way to see a group of variables?


You can track the values of any number of elemental values by using the Expression List Window.

There are a number of ways to get variables into this window:
  • Place your cursor over it in the Source Pane, right-click, then select Add to Expression List.
  • Select the variable or a part of a variable, then right-click, and select Add to Expression List.
  • Type directly into the window.
You can also right-click on an element within a Variable Window, and then select Add to Expression List.

Whenever your thread stops executing, all values within the Expression List Window are updated. So, for example, if your program is executing within a loop and you set a breakpoint within it, you can see the changes that occur to these variable's values.

Here are some things you should know:
  • If you dismiss this window and bring it back up, TotalView remembers the variables that were in it.
  • These values are remembered for your entire TotalView session. If you delete your program and restart it, these values don't go away. That is, you don't have to re-add them.
  • Notice that array indices were entered as a variable. Whenever execution stops, TotalView reevaluates the variable. This means that as the array index changes, so will the value being displayed.

What kind of data goes into the Expression List window?


The Expression List Window can only show scalar values such as integers, floating point numbers, and strings. If an expression does not have a scalar value, you'll see information about the expression.

While almost all data resolves to scalars, the actual expression of data in your program is seldom this simple. Data is aggregated into structures and arrays. Data is placed into allocated memory and accessed using pointers. Data is accessed using expressions.

The first clue about what can be displayed is the window's name. That is, why is it an "expression list" rather than a "variable list?" The answer is that a variable is actually an expression. Technically, it is an lvalue-something I won't define here. And, your program is always evaluating the lvalue to locate a location in memory. For example, while it may not be clear that my_var is an expression, my_var.an_element and my_var[i*j] clearly are.

Here is a window having four expressions:

Here are explanations:

i
A variable with one value. The Value column shows its value.

d1_array
An aggregate. This could be an array, a structure, a class, and so on. An aggregate's value cannot be displayed in one line. Consequently, TotalView just gives you some information about the expression in the Value column.

Whenever you place an aggregate in the Expression column, you will need to dive on it to get more information. diving, TotalView displays the expression in a Variable Window.

d1_array[1].d1_v
If TotalView can resolve what you enter in the Expression column into a single value, it will display a value in the Value column. If it can't, TotalView displays information in the same way that it displays information in the d1_array example. In this example, TotalView is showing information about an element within an array of structures.

d1_array[i-1].d1_v
This entity differs from the previous example in that the array index is an expression. Whenever execution stops in the current thread, TotalView reevaluates i-1. This means that TotalView might display the value of a different array item every time execution stops.

The expressions you enter cannot include function calls or have side-effects. For example, d1_array[i++].d1_v won't work. (You'll get an error message if you try this.)

How do you manipulate data in the Expression List window?


Here's the Expression List Window.

  • Changing order: manually change the order of displayed expressions by clicking on the up and down arrows.
  • Sorting: click on a header to sort a column into either ascending (click once) or descending (click twice) order. A third click returns the list to its original unsorted order.
  • Changing thread context: either select or type the thread in which TotalView will evaluate expressions.
  • Deleting: right-click on a line (for deleting an expression) or any where (for deleting all expressions) and then select a delete command.
  • Floating scope: normally, TotalView only evaluates a variable in the scope in which it is declared. If you select Floating from the context menu, TotalView will look for the variable in the current scope; that is, the scope of the current PC. This is real handy when looking at recursive routines or variables such as *this on C++ programs.
  • Diving: middle-click on a line to tell TotalView to display the variable in a separate Variable Window.
  • Changing displayed columns: If you right-click on the column display control, TotalView displays the following context menu:

    Using this menu, you can add and delete columns from the display. Perhaps the most interesting column is Comment. Here, you could enter, for example, an expression's value. This lets you keep track of what the value was as your program executes.

Casting variables


One of the major strengths of C and C++ is that you can cast a variable's data type. Here's an example that shows four casts for the same variable:

The target="_top">Built-in Types section of the TotalView Users Guide lists all of TotalView's built-in data types.

How do I track variables in recursive functions without having thousands of windows?


When your program calls a recursive function, it places its local (automatic) variables on the stack. So, if a function is called a hundred times, your program has a hundred stack frames, each with its own copy of the function's variables. This means that you have a hundred different versions of a variable.

If you need to track a variable, you either need to open a Variable Window for each or throw each in an Expression List Window.

A better solution is to tell TotalView that it should float the scope. That is, whenever the thread stops executing, TotalView looks for the variable in the current scope. If it finds the variable in this scope, it updates the value.

To change the scope, right-click on a variable and select Compilation Scope, then either Fixed or Floating:

changing the scope to Floating for four of the five instances of the i variable, the program was stepped a few times. Here's the updated Expression List Window:


Evaluation, scope, and this


Several recent tips have looked at scope and evaluation. The essence of those discussions was that TotalView understands the scope in which a variable exists. This means, for example, that if the program has more than one variable with the same name, TotalView knows which is which and know what information it should display.

Since there is always a this pointer that can reference an object, TotalView can differentiate each, no matter how many objects you have. The following example shows a Variable Window created by diving on a this pointer:

If you want to see the value of this for a second object, you would dive on a second this pointer and TotalView displays a second Variable Window. And diving on a third displays a third Window. And so on.

There's a simpler way: select the View > Compilation Scope > Floating command.

This command tells TotalView to use its current scope-that is, the scope defined by the PC-whenever it reevaluates your variable. This reevaluation occurs when execution stops or you choose a Windows > Update or Windows > Update All command.

That is, you set select this command, this gets evaluates in whatever scope exists when TotalView is stopped. In this example, if there is a this pointer in that scope, TotalView updates the Variable Window to show the value to which this is pointing. You only need one window to track what is being pointed to.

The following snapshot shows a Process and a Variable window. The Variable window contains a this variable and the Process window shows the selected stack frame.

(Actually, I dove on the this pointer so that the difference between this picture and the next would be very obvious.) I also selected the More button so that you could see more of the meta information TotalView keeps track. This information shows you the scope in which this was compiled.

Next, I selected a different stack frame. Here's how the display changed:

Because the compilation scope was floating, clicking on a different stack frame told TotalView to reevaluate the variable and the reevaluation took place in a a different scope. This is indicated by the "Compiled in Scope" line.

If you have a lot of Variable windows scattered around your desktop, you can close them all with one command: just select File > Close All. Similarly, selecting this command from the Expression List window closes all open Expression List windows.

How do I edit a variable's value?


Variable values being displayed in the Variable Window, Expression List Window, and the Stack Frame Pane of the Process Window. There are two ways to do this. In both cases, begin by selecting the variable
  • In the Variable Window and Expression List, press the F2 key.
  • In the Variable Window, Expression List Window and in the Stack Frame Pane, double-click your left mouse button.
TotalView displays an editing cursor, you can change the variable's value.


Editing Strings


Editing strings within a Variable or Expression window is usually very straight-forward. There are, however, conditions where an edit causes problems.

Buffer size
If you make a string longer that its original size, you can trash the memory following the string. For example, if the original string is abc, and you edit it to abcdefg, you are adding four characters to the string. Depending on where the string is allocated, the size of the buffer, and the contents of memory following the string's buffer, you can trash global memory, the heap, or the stack. So, if you make the string longer, you need to know the size of the memory allocated for it.

Escape codes
TotalView interprets escape codes in the same way as your compiler. For example, you enter a newline into a string by typing "\n", which is exactly what you would type in C. For example, if your string is abc and the buffer is big enough to hold two more characters, you can add two newlines into the string by editing it to a\nb\nc.

Using the Enter Key
In general, when you are editing a value, you can commit your changes-that is, write the changes back to memory-by pressing the Enter key. While this is true in most places, it isn't true when editing a string in the Variable window. This window displays long strings by wrapping them within the Value area. Consequently, pressing Enter inserts a newline character. To get around this problem, can either type Ctrl+Enter or click elsewhere in the window.

If your string contains newlines, selecting the View > Break At Newlines command toggles between showing you the "\n" in the string or forcing a line break in the displayed string.

The Variable window and expressions


Just calling the Variable Window a Variable Window is a bit misleading. Instead you should think of this window as being a tool that allows you to explore memory values. Controls within the window tell TotalView how it should display these values.

The information displayed within the Variable Window are actually lvalues or location values. This value is the result of evaluating expressions within rvalues or read values. The "r" and "l" do not mean right and left, even though, for example, rvalues can only appear on the right.

Here's what this means. Assume that you have a program with a three-dimensional array named array and is indexed by the first, second, and third variables and you dive on first to see its value. If you don't want to see the values of second and third in separate windows, you can just change the Expression field to either second or third. Similarly, to see the value contained within array(first, second, third), just type this text into the Expression field. This all works because everything typed in bold in this paragraph are actually expressions.

Expressions and View > Lookup variable


The View > Lookup Variable Dialog Box also uses expressions. Here's the dialog box:

Suppose you have an array named foo and the variable counter is indexing the array. If you type foo(counter+3) in Fortran or foo[counter+3] in C or C++, TotalView will evaluate the expression counter+3, then determine the offset within the foo array. Finally, the value displays in a Variable Window.

In a similar fashion, if you place your cursor over foo(counter+3) within the Source Pane, TotalView performs the same operations before it displays information in a Tooltip.

Seeing global variables


TotalView can show you the value of global variables when they are defined within files that were compiled using -g. Some variables, unfortunately, are defined as extern. Or, variables may be defined inside library files such as libc. Unless these libraries are built with debugging information, TotalView cannot map the variable to a location in memory. If you run into this problem, TotalView displays a "bad address" error message.

This doesn't mean that TotalView cannot show the value. You've got at least two ways to get at this information. One way is to refer to the variable using its loader symbol. For example, if your program uses the optind variable contained within libc, you can obtain its loader symbol as follows:

% nm a.out | grep optind 
080494f0 B optind@@GLIBC_2.0

Next, select the View -> Lookup Variable command and type optind@@GLIBC_2.0. (You can also type this in the Tools > Expression List Window.)

A second way it to define a global pointer to the variable and then recompile. For example:

int *ptr_optind=&optind;

Now, whenever you want to look at optind, look at ptr_optind instead.

Examining memory


The View > Examaine Format > Structured or View > Examine Format > Raw commands from within the Variable Window tell TotalView to display raw memory contents.

This data display is similar to how operating system dump commands such as od display data.

When displaying a structured view, the left side of the Window shows the elements of the data, whether it be a structure or an array. The center shows data as it is normally displayed within TotalView. The right displays the raw memory data. By default, this display is in hexadecimal. However, you can change it by selecting entries within the Format pulldown.

Use the Bytes radio buttons to change the number of bytes grouped together and Count to indicate how much memory to display.

Expressing expressions


The TotalView expression system is often overlooked even though it is one TotalView's most important components. This is because the expression system mostly works behind the scenes, interpreting variable information so that it can be displayed. In fact, if the expression system didn't exist, TotalView would not be able to display variables.

There are places where it is visible, and in almost all places where it is visible, you can edit the expression. Here are two variable windows, each containing a Fortran array.

The Expression field contains editable information and the changes you make can refer to any in-scope variable. For example:

Here, one element in one array is subtracted from an element in a second array. Both are indexed by expressions involving the variable i. The result is displayed in the Value area. Because TotalView creates this value, the Address field displays (None). This is because the value does not exist in your program's memory space.

TotalView also implements many of the operators found in C, C++, and Fortran. For example, you can use Fortran's built-in array addition operator:


The first example shows two Tools > Evaluate windows. The left one shows a variable's value in all processes within an MPI control group. The second multiplies the value by itself.

Expressions are, of course, a fundamental part of the Expression List Window. The Expression List window on the left shows three arrays. Because each has more than one value, TotalView just shows a data type. Note however, that the third array is actually an array expression, adding the values of the first two arrays. (These arrays do not have to be in the Expression List Window before they're used. They're just shown here because this is an example.) diving on this array expression, TotalView shows all of the array's values in a Variable Window.

Finally, you can also use expressions in the Process Window's Stack Frame pane.


The TotalView expression system does more than just replicate C, C++, and Fortran language constructs. It has its own primitives that you can either use separately or combine with language statements. For example, you here is a C expression that is used in an Eval point:

if (MyVar == 3) $stop;

The Fortran equivalent is:

if (MyVAR .eq. 3) $stop

This expression tells TotalView that it should check the value of MyVar and if the value is 3, it should stop program execution. $stop is a TotalView built-in statement that controls execution.

Other built-ins look at data. For example, the $oldval and $newval primitives let you compare value changes in a conditional watchpoints. For example:

if ($oldval >= 0 && $newval < 0) $stop

This tells TotalView to stop execution when the value in the watched location becomes negative.

Naturally, TotalView looks at the values of the "inscope" variable. Behind this is a system that sometimes shows blocks using a rather strange nomenclature.

The following figure shows a TotalView Process Window. The program within it is also strange, containing just declarations of the variable i, each in a different block, and these blocks are nested within one another.

Each of the five declarations of the variable is within its own scope. If you look in the Stack Frame pane, you'll see that each of the five is placed within a block, and the block is given a name.

Seeing a variable's value in all threads or all processes


When running a multiprocess or multithreaded program, there are many times when you want to see a variable's value in all of the program's processes or threads. Do this by selecting by selecting either the View > View Across> Process or View > View Across> Thread. Here's an example for an eight-process program:


Seeing changed variables


If a variable is being displayed in a Variable window, TotalView lets you know if a variable's value changed since the last time the window was updated. It does this by highlighting the value in yellow:

If you select the icon at the right side of the Value row and then click on Last Value from the displayed list, TotalView adds a column:

This lets you see what the value changed from.

TotalView will also highlight and show previous values for individual elements within compound variables such as arrays and structures. For example:


TotalView also does the same thing in the Expression List window.

Clicking on the icon on the right side of the Expression/Value header row lets you display additional columns. One of these columns is, of course, Last Value.

Limitations
The highlighting and last value features have a limitation: TotalView only saves some of the values associated with a variable. For example, if you have a 100,000 element array and are displaying 20 elements, only about 20 are saved. This means that when you scroll down to see additional values, TotalView doesn't know what the last values were. This means it can't determine if a value changed or display what the last value was. Also, scrolling the Variable window can cause new values to be saved, which means old values are lost.

If you really need a to keep previous values around, you should use the Expression List's Comment column:


Printing All Values in a Fortran Module


Here is a Tcl procedure that prints all values contained with a Fortran Module:

set image_name "the_executable_name" 
set mod_name "the_module_name"

set mod [TV::scope lookup $image_id by_language_rules $mod_name]
set vars [TV::scope walk $mod by_properties kind variable]
foreach sym $vars {
set name [TV::symbol get $sym base_name]
puts $name
}


Freezing Variable Window Data



Whenever execution stops, TotalView updates the contents of Variable Windows. More precisely, TotalView reevaluates the data found with the Expression area. If you do not want this reevaluation to occur, use the Variable Window's View > Freeze command. This tells TotalView that it should not change the information that is displaying.

you select this command, TotalView displays a marker that lets you know that the data is frozen.

Selecting the View > Freeze command a second time tells TotalView that it should evaluate this window's expression whenever execution stops.

In most cases, you'll want to compare this information with an unfrozen copy. Do this by selecting the Window > Duplicate command before you freeze the display. As these two windows are identical, it doesn't matter which one you freeze. If you use the Duplicate command you freeze the display, just select View > Freeze in one of the windows to get that window to update normally.

Displaying Returned Values When the Value Isn't Assigned to a Variable


If a function's returned value isn't assigned to a variable, the way you display the returned value isn't obvious. Here are three examples where this can happen:

if (foo() || bar() || anotherfoo()) { doSomething };
double a=f(3)+g(4)*h(5);
foo(); // called in void context and foo() is:
int foo()
{
do_something;
return some_expression;
}

To see the returned value, you need to figure out which register the program is using to point to the returned value. For example, this is $eax on x86 architectures and $rax on x86-64.

Here's how you display these values on an x86 system for the functions in the first bullet:
  1. Step into foo().
  2. Select the Out button.
  3. Select the View > Lookup Variable command (the accelerator is "v") and locate the value contained within $eax. You need to cast the value rather than $eax because TotalView does not let you cast a register.
  4. Enter a cast into the Type field to see the returned value.
  5. Step into bar().
  6. Select the Out button.
  7. Locate the value contained within $eax.
  8. Repeat as necessary.

Locking the Address


The Freezing Variable Window Data tip discussed freezing the display so that TotalView does not update the Variable Window's contents. Sometimes you only want to freeze the address, not the data at that address. Do this by selecting the View > Lock Address command. The following figure shows two variable windows, one of which has had its address locked.

Using this command lets you continually reevaluate what is at that address as execution progresses. Here are two examples of when you would use this command:
  • If you need to look at a heap address access through a set of dive operations rooted in a stack frame that has become stale.
  • If you dive on a *this pointer to see the actual value *this goes stale.

Examine syntax


If you are familiar with the examine command within GDB, you can use an undocumented CLI command. The new x command is actually a wrapper to the CLI's dexamine command. Also, it is slightly modified from the command used in GDB.

The syntax for this command is:

x/CountFormatSize {expression}

where:

Count
The number of items being displayed

Format
The way in which information is displayed:
a: address
b or t: binary,
c: char,
d: decimal,
f: float,
h or x: hexadecimal,
i: instruction,
o: octal, and
s: string

Size
1 or b: 1 byte
2 or h: 2 bytes
4 or w: 4 bytes
8 or g: 8 bytes

{ }
Curly braces are not required. However, you should wrap expressions in them to keep Tcl from interpreting $ and [ ].
Here's an example:

x/24xw {$rsp}

This command examines (x/) 24 hex (x) at word-size (w), and with values at the top of stack. Here is sample output from this command:

d1.<> x/24xw {$rsp}
0x7fbfffcbb0: 0x00000000 0x00000000 0x959aba9d 0x0000002a
0x9599c760 0x0000002a 0xbfffcca8 0x0000007f
0x7fbfffcbd0: 0xbfffcc00 0x00000001 0x00406390 0x00000000
0x9566b560 0x0000002a 0x0040d440 0x00000000
0x7fbfffcbf0: 0x004024e0 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x7fbfffcc10:


How do I display an array slice? That is, how do I display part of an array?


When displaying large arrays, you may only want to see a part of it. For example, you've got a 100 element array and you're only interested in the first 10.

As this example shows, you just enter what you need to see. You can even modify what's being shown so that TotalView displays every second or third (or whatever) value.

And, you can also add a filter to a slice so that TotalView only shows values meeting the criteria you specify.


Manipulating arrays


Another tip showed TotalView creating an array based on one element in an array of structures. You might be wondering if you can treat these kind of arrays the same as normal arrays. The answer is "yes". This is because TotalView's array operations work upon the array elements being displayed, not the underlying array or data.

This means that you can use all TotalView commands and operations that affect arrays. For example, you can use the View > Sort command to sort the data or the Tools > Statistics to obtain information about the distribution of this information. You can also filter, slice, and visualize this array.

This isn't something new added to manipulate these kinds of arrays. You can do these kind of operations upon all filtered and sliced arrays. There's one caution: TotalView operates upon what it thought your data was. So, if the data is being updated from other processes or threads, you may get inconsistent results. (But you knew that!)

How do I select which array values I want displayed?



While a large array contains a lot of information, seeing all the data can be overwhelming when you're trying to locate a problem. You can reduce what TotalView displays if you create a filter. For example, here's an illustration that uses filters that tell TotalView that it should only display denormalized numbers in one figure and infinite values in the other.

Here are examples showing some of the things you can do:
  • Only display array elements greater than 0

    > 0
  • Only display array elements greater than 125 and less than or equal to 135

    > 125:<=135
  • Do the same thing for Fortran

    .gt. 125:.le.135
  • Only display array elements containing NAN (Not A Number) values

    == $nan
  • Only display array elements whose values are greater than 0 and less than 50 or greater than 100 and less than 150

    ($value > 0 && $value < 50) ||
    ($value > 100 && $value < 150)
For more information, see Arrays in the TotalView Users Guide.

How do I display every 13th item in an array?


When you look at a Variable Window, it's pretty obvious that you can specify a lower and an upper bound for an array. It's not obvious that you can also specify a stride. (The stride tells TotalView how often it should display array elements.) Enter a stride into the Slice field by typing a colon and the number of values that should be skipped. Here are two examples:

The window in the upper left corner shows a 2D array where every 13th element is displayed. Notice that TotalView begins by displaying the element at an array dimension's lower bound. It then begins skipping elements.

You can combine a stride with a lower bound or an upper bound (or both). The window in the lower right corner tells TotalView to begin displaying data beginning at element (13, 13) before figuring out how often it should display elements.

How do I dive on dynamically allocated 2D arrays in C?


When you want to examine the contents of a dynamically allocated 2D array, you have to give TotalView some hints about the size of the dimensions. To do this, you'll need to cast the variable's data type you dive on it. (Casting is described in the TotalView Users Guide.)
The following program will be used to demonstrate this:

#include <stdio.h>
#include <stdlib.h>

#define SIZE1 5
#define SIZE2 10

int main( int argc, char *argv[]) {
int nx = SIZE1;
int ny = SIZE2;
int i=0, j=0;
double **u, **v;

/*
* Dynamically allocate one large contiguous chunk of data
* for 2D array.
*/

/* First allocate pointers to rows. */
u = (double **) malloc(nx * sizeof(double*));

/* Then allocate rows and set pointers to them. */
u[0] = (double *) malloc((nx * ny) * sizeof(double));

for(i = 1; i < nx; i++ ) {
u[i] = u[i-1] + ny;
}

for(i=0; i < nx; i++) {
for(j=0; j < ny; j++) {
u[i][j] = (double) ((i*10) + j);
}
}

/*
* Dynamically allocate each row independently
*/

v = (double **) malloc(nx * sizeof(double*));

for(i = 0; i < nx; i++ ) {
v[i] = (double *) malloc(ny * sizeof(double));
}

for(i=0; i < nx; i++) {
for(j=0; j < ny; j++) {
v[i][j] = (double) ((i*10) + j);
}
}
}

Compile this program using a command like:

gcc -g -o 2darray 2darray.c

Set a breakpoint at the end of the program, line 56, and go to the breakpoint.

The 2D array named u was defined in one contiguous chunk of memory. This is similar to how static 2D arrays are stored. Here's the Variable Window TotalView displays you dive on u:

To tell TotalView that u is a pointer to a 5x10 array, you need to change (that is, cast) the data type. Here is what TotalView displays casting the data type to double[5][10]**:

Now, dive on the Value line:

We're almost there. TotalView now knows that u is pointing to a 5x10 array. you dive on this pointer, TotalView will display the 2D array:

Let's now look at the v array variable. This array was created by independently allocating each row. Therefore, you need to cast the variable in slightly different way. (This time, I'll combine all the snapshots into one.)
  1. Dive on the v variable
  2. Change the type to double[10]*[5]*. This says that v is a pointer to an array of 5 double pointers to arrays of 10 doubles.
  3. Dive on the Value line and the new Variable Window shows an array of pointers.
  4. To see the array pointed to by any of these five array elements, dive on the line. In this case, I dove on the first line repeatedly.
    [IMAGE:Variables_diving_dynamic_array.png

More on dynamically allocated arrays


Last week's tip discussed how you can display dynamically allocated arrays. You saw that you could cast memory into an array and then dive to see its data. Still, what you saw wasn't quite right. What you really want is to see one unified array. This is what the Dive in All command will do. The following figure compares regular diving and Dive in All.
  • The top snapshot contains the array of pointers.
  • The second snapshot is an intermediate dive.
  • The center snapshot shows the vector displayed you dive on one of these pointers.
  • The bottom snapshot shows the entire array as it will be put together by the Dive in All command.

Because TotalView acts upon the information displayed in a window, you can use commands such as Tools > Visualize and Tools > Statistics to obtain information about this array, an array that really only exists within TotalView.


How do I reexecute some code without restarting my program?


you've just walked through a piece of code, you may want to reexecute it. However, restarting the program to get to that place would take a long time. Here's what you can do:
  1. Select the line at which you wish execution to resume.
  2. Select the Thread > Set PC command.
  3. TotalView displays a question box that asks if this is really what you want to do. Answer Yes.

While execution will resume at this point, nothing will be reset. So, there's some risk that your program will crash; for example, setting the PC could cause something inadvertent to occur like reallocating memory that had already been reallocated, etc.

You can also use this technique to move the PC forward so that it skips over code you don't want your program to execute.

Can TotalView animate how my array data is changing?


When you use the Variable Window's Tools > Visualize command, TotalView will bring up a new window containing a visual depiction of an array. The graph being displayed only changes when you again select the Visualize command.
In many cases, you would like TotalView to automatically update the graph so that you see how your data changes as your program executes. You can do this using an eval point. Here's how:
  1. Select a line in your program and right click on the line number. Select Properties from the context menu. TotalView then displays its Action Point Properties Window.
  2. Select the Evaluate Button at the top of the window and then use the $visualize function to name an array. Here's an example:
  3. Click OK, then start your program by selecting the Go Button.

    Drawing the graph can take a lot of time, which means that you really don't don't want to show every change that occurs. In this example, TotalView was told only to display the graph when i was either equal to 4 or 8. It was also told to halt when i was equal to 6. This was done so that I could stop execution to take a longer look at the graph.

How do I animate an array slice?


You probably don't want to animate a large array because it takes too much time to resend all the array's data to the Visualizer. You can increase performance by visualizing an array slice. The syntax of the $visualize function is not what you would expect as TotalView requires you send the slice as a second argument. Here's how you'd visualize a C array:

$visualize(my_array,"[12][:]");

Here's the same array in Fortran:

$visualize(my_array,'(:,13)' )

Note these differences:
  • Most important difference: in C, place the slice within double quotes. In Fortran, use a single quote.
  • C and Fortran store arrays in different ways.
  • The first C array element is always element 0. In Fortran, it can vary. However, the default is 1.
If you are displaying an array (not visualizing it) from within the CLI, you can append the slice directly to the array name and you could also add a stride. For example, here's how to see ever 10th element of an array:

dprint {my_array[::10]}


How do I use Help?


When you select a help button from within TotalView or select a Help menu command, TotalView either connects to your browser or opens a new browser window.

What do you do if a browser doesn't come up?
If TotalView can't find a browser, you can tell it where one exists by setting the TV_HTMLHELP_VIEWER environment variable. For example:

setenv TV_HTMLHELP_VIEWER /usr/bin/netscape

Help Loads Slowly. Why?
It's doing a lot and reading a lot of data. A browser typically loads everything (or nearly everything) before it displays things. Because it loads slowly, you're better off minimizing the browser window rather than killing it you've read a topic.

What are the tabs on the left?
There are actually two help systems. One uses Java and the other uses JavaScript. Help will figure out which it can use.

  • The lower right snapshot is a section of the left pane of the Java-based system. The upper left is the JavaScript version. There are either three or four tabs:
  • The Contents tab contains the table of contents for all of the TotalView documentation. (The web version contains the release notes. The version shipped with TotalView does not.)
  • Notice that TotalView Tips are part of help.
  • The Index tab contains a combined index for all books.
  • The Search tab allows you to search for information in one book or all of the books.
  • The Favorites tab (Java only) allows you to save topics for easy recall later.
What do the buttons on the right do?
Here's what these buttons do:


Evaluating expressions


While the TotalView expressions system works behind the scenes in many places, the one place where it is completely visible is the Tools > Evaluate Window. Here's a simple expression:

Note the following:
  • The scope of the evaluation is the control group. This means that TotalView evaluates a value in each of the group's worker threads. Be careful what you choose here. In most cases, running the evaluation within the current thread is really what you want to do.
  • The ++ operator increments the value of vary_sleep in the program's memory. Consequently, when the program begins execution, the variable's value differs from what it was when the program stopped executing.
  • TotalView automatically displays the results of the evaluation. There is no need to explicitly print the results.

Evaluating expressions containing functions


The following example shows two Evaluate Windows. The upper left window calls a C++ member function. The second adds two additional calls.

This example uses the TotalView 7.0 beta software. While you can evaluate functions in shipping versions, you can't evaluate member functions in them.

In a similar way, you an use these expressions in the Expression List Window. However, because there can be side effects to evaluating methods and functions, the results displayed may not always be accurate. Here are two examples. The upper left window contains two invocations of x.get_memb1(). Because set_memb1() changes the value, TotalView shows each with its own value and each was correct when TotalView evaluated the expression.

The bottom right window shows the results invoking the Window > Update Command. Things now look OK in this contrived example. Here's an equally contrived example where values will always differ.

In this case, if TotalView tried to insure that all values displayed were what was in memory, it would be forced into an infinite loop. While the side-effects are obvious in these examples, they might not be in your program.

Why is there expression evaluation within TotalView?


Either directly or indirectly, accessing and manipulating data requires an evaluation system. When your program (and TotalView, of course) accesses data, it must determine where this data resides. The simplest data lookups involves two operations: looking up an address in your program's symbol table and interpreting the information located at this address based on a variable's datatype. For simple variables such as an integer or a floating point number, this is all pretty straightforward.

Looking up array data is slightly more complicated. For example, if the program wants my_var[9]-this tip uses C and C++ notation-it looks up the array's starting address, then applies an offset to locate the array's 10 element. In this case, if each array element uses 32 bits, my_var[9] is located 9 times 32 bits away.

Unfortunately, array references are usually more complicated than this. In most cases, your program uses variables or expressions as array indices instead of integer constants; for example, my_var[cntr] or my_var[cntr+3]. In the later case, TotalView must determine the value of cntr+3 before it can access an array element.

Using variables and expressions as array indices are common. However, the array index can be as integer returned by a function. For example:

my_var[access_func(first_var, second_var)+2]

In this example, a function with two arguments returns a value. That returned value is incremented by two, and the resulting value becomes the array index.

Here is an illustration showing TotalView accessing the my_var array in the four ways discussed in this tip:


Using the call graph


The call graph is a diagram that shows all the currently active routines. These routines are in all of your currently executing processes and threads. To display a call graph, choose the Tools > Call Graph command in the Process Window.

The TotalView call graph is a dynamic call graph in that it only displays the routines that are on the stack at the time when you invoked this command. The arrows within the graph indicate that one routine is called by another.

Pressing the Update button tells TotalView to recreate this display.

You can limit to the display to a group of processes and threads by using the scope selector at the top of the window. If you don't use this control, TotalView displays a call graph for the group defined in the toolbar of your Process Window. If TotalView is displaying the call graph for a multiprocess or multithreaded program, numbers next to the arrows indicate which thread has the routine on its call stack.

If you dive on a routine within the call graph, TotalView creates a group called call_graph. This group contains all of the threads that have the routine on its call stack. If you look at the Process Window's Processes tab, you'll see that the call_graph set is selected in the scope pulldown. In addition, the context of the Process Window changes to the first thread in this set.

How do I save a call graph?


If you select the Save As button at the bottom of the Call Graph Window, TotalView lets you save call graph information as a Graphviz dot file. At a later time, you can use Graphviz to recreate this information.

How do I see which files my application is performing I/O Upon?


Most operating systems let you list the open files a process is using. The way that this is done, if it's possible, is platform specific. Here are two tricks that you can use to see your open files on Linux systems. Begin by defining TotalView CLI Tcl procedures. One lists /proc/pid/fd and the other runs lsof:

proc procfd {} {
exec ls -l /proc/[f p TV::process get [TV::focus_processes]
syspid]/fd
}

proc lsof {} {
exec lsof -p [f p TV::process get [TV::focus_processes] syspid]
}
The procfd procedure, which should work on any Linux system, lists the contents of /proc/pid/fd for the focus process. For example:
d1.<> procfd
total 0
lrwx------ 1 a_user totalvue 64 Jul 25 07:59 0 -> /dev/pts/6
lrwx------ 1 a_user totalvue 64 Jul 25 08:01 2 -> /dev/pts/6
lrwx------ 1 a_user totalvue 64 Jul 25 08:01 3 -> socket:[18255404]
lrwx------ 1 a_user totalvue 64 Jul 25 08:01 4 -> socket:[18255410]
lr-x------ 1 a_user totalvue 64 Jul 25 08:01 5 -> /proc/10756/maps
lr-x------ 1 a_user totalvue 64 Jul 25 08:01 6 -> /proc/10763/maps
d1.<>

The lsof procedure runs the lsof program. While it is portable across more operating systems. it is not a standard utility. This means that you may need to install (for example, yum install lsof on Red Hat systems) or build it from source. Also, some systems may be configured so that you need to run it as root.

Here is an example of the output from the lsof Tcl procedure:

d1.<> lsof
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
cli 10752 a_user cwd DIR 8,5 53248 10552005
/home/a_user/../totalview
/src/structures
cli 10752 a_user rtd DIR 8,2 4096 2 /
cli 10752 a_user txt REG 8,5 190104566 655922
/home/a_user/.../totalview
/src/structures/cli
cli 10752 a_user mem REG 8,2 125728 488984
/lib/ld-2.5.so
cli 10752 a_user mem REG 8,2 1585788 489013
/lib/libc-2.5.so
cli 10752 a_user mem REG 8,2 16528 489014
/lib/libdl-2.5.so
cli 10752 a_user mem REG 8,2 208344 489017
/lib/libm-2.5.so
cli 10752 a_user mem REG 8,2 125668 489019
/lib/libpthread-2.5.so
cli 10752 a_user mem REG 8,2 76396 489029
/lib/libresolv-2.5.so
cli 10752 a_user mem REG 8,2 7720 489032
/lib/libcom_err.so.2.1
cli 10752 a_user mem REG 8,2 15264 489035
/lib/libutil-2.5.so
cli 10752 a_user mem REG 8,2 297464 1661452
/usr/lib/libncurses.so.5.5
cli 10752 a_user mem REG 8,2 242880 489030
...
d1.<>


Starting MPI programs from a shell


The Parallel tab within the File > New Program dialog box lets you select which MPI you will be using and other MPI-related information. Before this tab was added, the only way to start an MPI program was to start it from a shell prompt. While the old method still works and is the only way to start an MPI job in some cases, the new method is easier to use as all you do is type totalview as TotalView remembers what you previously entered.

Starting TotalView using the older method has the advantage that the Process Window is immediately shown and that you do not need to enter additional information. TotalView version 8.6 command-line options let you enter this additional information using the new method without having to display its Startup Parameters or New Program dialog boxes. These options tell TotalView to bypass these dialog boxed in the same way that the old method bypassed them.

Here are the new command-line options and their definitions:

-mpi starter
Names the MPI that your program requires. The starter names that you can enter are those that appear in the pulldown list within the New Program's Parallel tab. If the starter name has more than one word (for example, Open MPI), enclose the name in quotes. For example:

-mpi "Open MPI"
-starter_args "arguments"

Tells TotalView to pass arguments to the starter program. You can omit the quotation marks if arguments does not have embedded spaces.

-np num or -procs num or -tasks num
Specifies how many tasks that TotalView should launch for the job.

-nodes num
Specifies the number of nodes upon which the MPI job will run.

You must also use the -no_show_startup_parameters command-line option.

Here's an example:

totalview my_prog -mpi MPICH -np 4 -no_show_startup_parameters

If you want to enable the memory debugger or ReplayEngine, you can enable them from the command-line using the -memory_debugging and -replay options.