Debugging CATERR in field systems

Let’s say you wanted to debug a CATERR on an Intel x86-based system out in the field. And let’s say that the CATERR only happened in a given datacenter once a week. An embedded implementation of the In-Target Probe (ITP) would help.

Improved Reliability, Availability and Serviceability (RAS) features are needed on Intel x86-based designs to troubleshoot system board crash/hang or No Trouble Found (NTF) problems. An embedded implementation of the Intel ITP can provide these capabilities. Also known as ScanWorks Embedded Diagnostics (SED), this implementation eliminates the need for external cabling and hardware pods – rather, the run-control library and support software is embedded on-board the server, router, wireless basestation, or other Intel-based target.

For simpler designs, the SED engine can reside on a service processor or BMC (baseboard management controller):

 

SED BMC
 

For higher-end, high-reliability systems, the SED engine is divided between the service processor and an on-board FPGA:

  SED FPGA

For the FPGA implementation, the interface to the SED engine is a simple VHDL port that behaves like a standard Wishbone slave interface. The port comprises the following (example only):

rst_i: Reset input

clk_i: Clock input (100MHz)

addr_i[8-0]: D-word (32-bit) aligned byte address

sel_i[3:0]: Select input array

cyc_i: Cycle input

stb_i: Strobe input

lock_i: Lock input

we_i: Write enable input

dat_i[31:0]: Data input array

ack_o: Acknowledge output

dat_o[31:0]: Data output array

jtag_clk: JTAG source clock input (see note)

TRST: Test Logic Reset output from embedded TAP Controller

TCK: Test Clock output from embedded TAP Controller

TMS: Test Mode Select output from embedded TAP Controller

TDI: Test Data In (UUT referenced) output from embedded TAP Controller

TDO: Test Data Out (UUT referenced) input to embedded TAP Controller

PREQ

PRDY

DBR

RESET

oe_n: Output enable for all outbound SED signals (JTAG and sideband)

Once in place, the SED ITP functionality can be invoked either manually or in response to system events, such as a CATERR, to provide a complete forensics system dump.

More technical information is available here: ScanWorks Embedded Diagnostics. The need for embedded diagnostics and its return on investment is described in this white paper: Embedded Diagnostics for Highly Available Systems – Whitepaper.