Run-Control versus Trace for debugging

New Trace features are coming on Intel processors. How do Trace tools make debugging more effective? 

Most debug engineers who work on low-level bugs are familiar with run-control. It has been around for decades and is available on most commercial processors. Using JTAG and some sideband signals, it allows external emulator hardware and debug software to access and control CPUs for debug purposes. Run-control is used to stop and start processors, read and write registers, I/O and memory, and perform such functions such as setting breakpoints and single-stepping through code.

For Intel platforms, examples of emulator hardware include the Intel® In-Target Probe (ITP), and the ASSET Arium ECM-XDP3e probe. Examples of debugging applications which use this hardware are the Intel Platform Debug Toolkit (PDT) and ASSET SourcePoint debugger. Debugging applications layer useful capabilities on top of run-control, such as source-level debug, symbol search, scripting, variable watch, and so on.

In the most general sense, run-control allows the debug engineer to step forward in time. For example, a debug engineer may place a breakpoint at a suspect spot in the code. When the breakpoint is hit, the system halts, and the debugger gains control. The engineer may then examine the state of the system at the break, view the content of variables at that point, and then move forward in the code, gaining insight into the source of the bug. This can be seen below, where the arrow denotes the instruction pointer, and the red and green blobs denote hardware and software breakpoints respectively:

  Source

Unfortunately, run-control, in and of itself, may give little insight into what happened leading up to a bug. This is particularly true in the case of asynchronous events such as interrupts, which will of course not show up in the normal user code flow. These events may trample on data which does not belong to them. Bugs may be introduced by any code running on any thread, core or socket. And within most systems, there are numerous separate agents (or engines), an example of which is the Intel Management Engine (ME), which run code asynchronously and are impossible to observe via just run-control.

Trace, on the other hand, allows the debug engineer to look backwards in time. The developer can then see code which does not appear in the normal user code flow. As such, it will capture the flow of asynchronous events, and/or make it easy to correlate in time the code flow running on different engines within the system.

So let’s say a bug occurs very intermittently in an Ethernet driver, which seems to arise from some sort of data corruption. Trace is used to look at the normal code flow without the bug. A breakpoint is set somewhere within the area of code where the corruption is thought to occur. Then the system is re-started, and can run for as long as needed (minutes? hours? days?) until the error occurs and the breakpoint is hit. The trace buffer can then be examined to find the root cause of the bug – perhaps, in this case, a result of simultaneous Ethernet and USB activity.

So, trace is an extremely useful debugging tool. It can save weeks or months of development time by helping solve the most elusive, intermittent bugs. Think of trace as the breadcrumbs that Hansel and Gretel left behind in the forest – it might help you get home sooner.

An excellent resource on trace is within our eBook, Hardware-Assisted Debug and Trace within the Silicon.