In my webinar with the UEFI Forum, I demonstrated some of the utility of using JTAG functionality within BMCs to perform out-of-band debug. This is a tutorial on the coding practices to use the SED API.
As a backgrounder, you can watch the webinar on YouTube, starting at about 28 minutes in, where I demonstrated a simple ‘C’ application using embedded ITP to exercise a PCI Express port. This app, called ltloop, stresses the Link Training Status & State Machine (LTSSM) on a PCIe connection, repeatedly retraining the port and looking for induced errors due to chip, board or firmware marginalities. I wrote a little about the ltloop application in my blog PCI Express LTSSM stress using BMC-based Embedded JTAG/ITP.
But first, some more background on what embedded ITP is all about.
Embedded ITP implements in-situ diagnostics via direct support of the x86 JTAG-based run-control API down on the target. The run-control API are synonymous with lower-level Intel In-Target Probe (ITP) procedures. We call this ScanWorks Embedded Diagnostics (or SED for short). Running directly down on the BMC as an embedded debug agent, it operates completely out-of-band, taking advantage of the “bare-metal” nature of JTAG. This gives it several advantages over traditional debugging agents:
- Speed: Since there is no handshaking to/from a remote host, the Ethernet bottleneck is eliminated; resulting in a tremendous performance boost.
- Scalability: Applications can be run independently and simultaneously across multiple targets, without the need for a remote host. This is truly at-scale debug.
- Security: SED runs down on the target, with no remote host required. This eliminates the need for perhaps untrusted external network access to run the debug functions.
The hardware enabler for embedded ITP requires connectivity to be established between the BMC and the CPU JTAG chain, so it can drive the debug logic for run-control. You can see this from a block diagram available on the Open Compute Project wiki page for Microsoft Azure’s Project Olympus (note: click on the “Electrical Collateral” link on that page to retrieve the zipped schematics:
Circled in red is the standard XDP interface, to which benchtop JTAG debuggers connect. Note that the BMC can drive this too, with embedded ITP functionality.
A close-up of the BMC/CPU overall topology is as below:
The service processor, or baseboard management controller (BMC) as it is termed on server platforms, can host a small ‘C’ library instantiating the ITP functions. A subset of the API presented is as follows:
EnterDebugMode SetActiveCPU SetActiveCore SetActiveThread ReadGPR ReadMSR ReadIO ReadCSR ReadMemory DownloadUserDiag ExecuteUserDiag UploadDiags
In SED, applications that write directly to the API and run down on the BMC are termed On-Target Diagnostics, or OTD for short. Since ITP presents a rich set of capability for hardware validation, debug, and test functions, many routines can use this environment to create out-of-band utilities. ASSET provides a standard environment for the creation of OTD, with full documentation.
A good training example available from our source code library presents a small routine that dumps the contents of the machine check error (MCE) banks, named dumpmca(). Chapter 15 of the x86 Architectures Software Developer’s Manual (SDM) is dedicated to a full description of the Intel Machine Check Architecture, and the essence of the OTD is to simply extract and display the value of the specified MCE bank. As from last week’s blog, an example of the console output is below:
>./dumpmca -s1 -c1 -b0 dump MCA register bank Library version = 0.22.04 Selecting socket 1 Global machine check registers: MCG Cap register: 0x000000000f000c1c MCG Status register: 0x0000000000000000 MCG Control register: 0x0000000000000001 MCG EXT Control register: 0x0000000000000000 Machine check registers for socket: 1 core: 1 bank: 0 IA32_MC0_CTL: 0x0000000000000fff IA32_MC0_STATUS: 0x0000000000000000 IA32_MC0_ADDR: 0x0000000000000000 IA32_MC0_MISC: 0x0000000000000000 IA32_MC0_CTL2: 0x0000000000000000 Time for test: 0.17 seconds. Done, exiting debug mode.
It’s good to understand a little about Intel Machine Check Architecture before diving into the code. Firstly, a general topology of error classifications can be seen in the article Autonomic Foundation for Fault Diagnosis in the Intel Technology Journal, Volume 16, Issue 2, 2012. See below:
Detectable but Uncorrected Errors (DUE) can manifest themselves via blue screens or other system hangs/crashes. In Intel designs, internal processor errors, such as a processor instruction retirement watchdog timeout (or three-strike timeout) “wedge” the system, will cause a CATERR assertion and can only be recovered from by a system reset. Identifying the root cause of such events is notoriously difficult, as the system is effectively wedged and cannot be put into full probe mode by JTAG-assisted hardware debuggers. In such extreme cases the machine check error handler at vector 0x18h does not execute correctly. But, some breadcrumbs can still be retrieved, especially by SED-based OTDs.
As an aside, a good Intel reference on processor instruction retirement watchdog timeouts can be found here: Processor Reorder Buffer (ROB) Timeout Debug Guide. Keep in mind that ROB timeouts are only one of many types of internal, catastrophic errors. This document is a little dated, but does give a good high-level overview.
To understand the more technical detail, excellent public references on this are the Machine Check Architecture and Interpreting Machine Check Error Codes chapters within the Intel® 64 and IA-32 Architectures Software Developer’s Manual (the current version is dated May 2020). Chapter 15 of Volume 3 makes for great bedtime reading. I’ll excerpt the techniques relevant to triaging a CATERR on an Intel Ice Lake server part. As a reference, since the machine check MSRs play such a critical role in root cause resolution, here’s an image excerpt from the SDM in Chapter 15:
It’s easy to correlate the global control MSRs and the per-core/thread banks from the output of dumpmca() above. For example, we see that the IA32_MCG_CAP MSR = 0x000000000f000c1c. And looking more closely at this register definition in the SDM:
It’s an exercise for the student to map each of the bitfields to the value of the MSR. Of course, this can be done programmatically, and is in fact available within the CScripts.
In a typical BMC environment equipped with SED, the BMC awaits assertion of the #MC signal (CATERR or MSMI), and then acts accordingly. The first step is usually to query the uncore registers, also known as CSRs (Configuration and Status Registers) via PCI configuration space. This is because the uncore comprises the shared LLC cache, CHA, IMC, PCU, Ubox, IIO, and UPI modules. The PCU captures the error sources in the MCA_ERR_SRC_LOG. This usually indicts the offending socket, and characterizes the fault as an MCERR or IERR. The SED API that reads the uncore registers (in fact any CSR) has the following form:
int ai_mReadCSR(int mHandle, unit16_t DeviceNo, unit16_t FunctionNo, unit16_t Offset, unit32_t *RegisterData);
After this, it’s a matter of reading the applicable MC MSRs from all the processor cores within the socket that asserted the IERR (presuming availability and not completely wedged). Chapter 16 of the SDM is an excellent reference to the methodology to be used. For example, Section 16.11 outlines the internal error codes for machine check errors in the register bank IA32_MC4_STATUS. So, just use ReadMSR to retrieve it:
int ai_mReadMSR (int mHandle, uint64_t MsrAddress, uint64_t *RegisterData);
So, let’s look at an entire routine to read a machine check bank:
//dumpmca - display contents of Machine Check Architecture registers //Uses MSRs to read MCA registers #include <stdio.h> #include <stdlib.h> #include <stdint.h> #include <sys/time.h> #include <dlfcn.h> #include <string.h> #include <unistd.h> #include <itpdriver/itp_driver.h> #include <itpdriver/itp_driver1.h> #include <itpdriver/defines.h> #include <time.h> #define MSR_IA32_MCG_CAP 0x179 #define MSR_IA32_MCG_STATUS 0x17a #define MSR_IA32_MCG_CTL 0x17b #define MSR_IA32_MCG_EXT_CTL 0x4d0 //option values int m_socket = CPU_ZERO_POS; int m_core = CORE_ZERO_POS; int m_bank = 0; #define MAX_BANK 31 // bank-> 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 uint64_t msr_ia32_mci_ctl[] = {0x400, 0x404, 0x408, 0x40c, 0x410, 0x414, 0x418, 0x41c, 0x420, 0x424, 0x428, 0x42c, 0x430, 0x434, 0x438, 0x43c, 0x440, 0x444, 0x448, 0x44c, 0x450, 0x454, 0x458, 0x45c, 0x460, 0x464, 0x468, 0x46c, 0x470, 0x474, 0x478, 0x47c}; uint64_t msr_ia32_mci_status[] = {0x401, 0x405, 0x409, 0x40d, 0x411, 0x415, 0x419, 0x41d, 0x421, 0x425, 0x429, 0x42d, 0x431, 0x435, 0x439, 0x43d, 0x441, 0x445, 0x449, 0x44d, 0x451, 0x455, 0x459, 0x45d, 0x461, 0x465, 0x469, 0x46d, 0x471, 0x475, 0x479, 0x47d}; uint64_t msr_ia32_mci_addr[] = {0x402, 0x406, 0x40a, 0x40e, 0x412, 0x416, 0x41a, 0x41e, 0x422, 0x426, 0x42a, 0x42e, 0x432, 0x436, 0x43a, 0x43e, 0x442, 0x446, 0x44a, 0x44e, 0x452, 0x456, 0x45a, 0x45e, 0x462, 0x466, 0x46a, 0x46e, 0x472, 0x476, 0x47a, 0x47e}; uint64_t msr_ia32_mci_misc[] = {0x403, 0x407, 0x40b, 0x40f, 0x413, 0x417, 0x41b, 0x41f, 0x423, 0x427, 0x42b, 0x42f, 0x433, 0x437, 0x43b, 0x43f, 0x443, 0x447, 0x44b, 0x44f, 0x453, 0x457, 0x45b, 0x45f, 0x463, 0x467, 0x46b, 0x46f, 0x473, 0x477, 0x47b, 0x47f}; uint64_t msr_ia32_mci_ctl2[] = {0x280, 0x281, 0x282, 0x283, 0x284, 0x285, 0x286, 0x287, 0x288, 0x289, 0x28a, 0x28b, 0x28c, 0x28d, 0x28e, 0x28f, 0x290, 0x291, 0x292, 0x293, 0x294, 0x295, 0x296, 0x297, 0x298, 0x299, 0x29a, 0x29b, 0x29c, 0x29d, 0x29e, 0x29f}; void dumpmca(int mHandle) { int iError = 0; int bankMax = 0; uint64_t regdata; iError = ai_mReadMSR (mHandle, MSR_IA32_MCG_CAP, ®data); if (iError != AI_SUCCESS) { printf ("\nERROR reading MSR_IA32_MCG_CAP: %s\n" , ai_ErrorToString(iError)); return; } printf("\nGlobal machine check registers:\n"); printf("MCG Cap register: 0x%016llx\n", regdata); bankMax = regdata & 0xFF; if ((m_bank < 0) || (m_bank > bankMax)) { printf("Invalid bank specified, must be between 0 and %d\n", bankMax); return; } if (m_bank > MAX_BANK) { printf("Requested bank is larger than program supports: %d\n", MAX_BANK); return; } iError = ai_mReadMSR (mHandle, MSR_IA32_MCG_STATUS, ®data); if (iError != AI_SUCCESS) { printf ("\nERROR reading MSR_IA32_MCG_STATUS: %s\n" , ai_ErrorToString(iError)); return; } printf("MCG Status register: 0x%016llx\n", regdata); iError = ai_mReadMSR (mHandle, MSR_IA32_MCG_CTL, ®data); if (iError != AI_SUCCESS) { printf ("\nERROR reading MSR_IA32_MCG_CTL: %s\n" , ai_ErrorToString(iError)); return; } printf("MCG Control register: 0x%016llx\n", regdata); iError = ai_mReadMSR (mHandle, MSR_IA32_MCG_EXT_CTL, ®data); if (iError != AI_SUCCESS) { printf ("\nERROR reading MSR_IA32_MCG_EXT_CTL: %s\n" , ai_ErrorToString(iError)); return; } printf("MCG EXT Control register: 0x%016llx\n", regdata); printf("\nMachine check registers for socket: %d core: %d bank: %d\n", m_socket, m_core, m_bank); iError = ai_mReadMSR (mHandle, msr_ia32_mci_ctl[m_bank], ®data); if (iError != AI_SUCCESS) { printf ("\nERROR reading MSR_IA32_MCi_CTL: %s\n" , ai_ErrorToString(iError)); return; } printf("IA32_MC%d_CTL: 0x%016llx\n", m_bank, regdata); iError = ai_mReadMSR (mHandle, msr_ia32_mci_status[m_bank], ®data); if (iError != AI_SUCCESS) { printf ("\nERROR reading MSR_IA32_MCi_STATUS: %s\n" , ai_ErrorToString(iError)); return; } printf("IA32_MC%d_STATUS: 0x%016llx\n", m_bank, regdata); iError = ai_mReadMSR (mHandle, msr_ia32_mci_addr[m_bank], ®data); if (iError != AI_SUCCESS) { printf ("\nERROR reading MSR_IA32_MCi_ADDR: %s\n" , ai_ErrorToString(iError)); return; } printf("IA32_MC%d_ADDR: 0x%016llx\n", m_bank, regdata); iError = ai_mReadMSR (mHandle, msr_ia32_mci_misc[m_bank], ®data); if (iError != AI_SUCCESS) { printf ("\nERROR reading MSR_IA32_MCi_MISC: %s\n" , ai_ErrorToString(iError)); return; } printf("IA32_MC%d_MISC: 0x%016llx\n", m_bank, regdata); iError = ai_mReadMSR (mHandle, msr_ia32_mci_ctl2[m_bank], ®data); if (iError != AI_SUCCESS) { printf ("\nERROR reading MSR_IA32_MCi_CTL2: %s\n" , ai_ErrorToString(iError)); return; } printf("IA32_MC%d_CTL2: 0x%016llx\n", m_bank, regdata); } void usage(void) { printf("\nUsage:\n"); printf("-s<n> socket n=1|2\n"); printf("-c<n> core\n"); printf("-b<n> bank n=0..128\n"); } int parseArgs(int argc, char **argv) { int c; int retval = 0; while ((c = getopt (argc, argv, "s:c:b:")) != -1) { switch(c) { case 's': m_socket = atoi(optarg); //Will check socket # in main after we get system topology and learn # of CPUs break; case 'c': m_core = atoi(optarg); break; case 'b': m_bank = atoi(optarg); break; case '?': case 'h': retval = 1; //Note: caller will treat as error and print usage() break; } // switch... } //while... return retval; } int main (int argc, char **argv) { int numcores; int curcore; int numcpus; int curcpu; int iError = 0; bool pwrchk = true, scnsetup = true, savemodarch = true; int mHandle; FILE *UUTDiagsHexFile; char ver[200]; uint64_t msr; uint64_t regdata; int i; struct timespec start_time; struct timespec end_time; double secs; uint32_t bus; uint32_t dev; uint32_t fun; ai_ITPtopology_t topo; UUTDiagsHexFile = NULL; printf("\n\ndump MCA register bank\n"); iError = parseArgs(argc, argv); if (iError != 0) { usage(); return iError; } ai_GetLibraryVersion(ver); printf("Library version = %s\n", ver); AI_pdcselector pdctarget = AI_pdc_0; if ((iError = ai_mOpen(pdctarget, 1, &mHandle)) != AI_SUCCESS) { printf ("\nOpen ERROR: %s Channel %i\n" , ai_ErrorToString(iError), pdctarget); return 1; } if ((iError = ai_mSetTargetCPUType(mHandle, AI_sandybridge)) != AI_SUCCESS) { printf ("\nSetTargetCPUType: ERROR: %s Channel %i\n" , ai_ErrorToString(iError), pdctarget); return 1; } ai_mConfig (mHandle, 100, UUTDiagsHexFile, 0x10000LL, pwrchk, scnsetup, savemodarch); iError = ai_mGetITPScanChainTopology(mHandle, &topo, true); if (iError != AI_SUCCESS) { printf ("\nERROR getting target topology: %s\n" , ai_ErrorToString(iError)); return iError; } numcpus = topo.tck[TCK_ZERO_POS].numcpus; if ((m_socket < CPU_ZERO_POS) || (m_socket >= (CPU_ZERO_POS + numcpus))) { printf("Invalid socket number, must be between %hu and %hu\n", CPU_ZERO_POS, numcpus); return -1; } numcores = topo.tck[TCK_ZERO_POS].cpu[CPU_ZERO_POS].numcores; if ((m_core < CORE_ZERO_POS) || (m_core >= (CORE_ZERO_POS + numcores))) { printf("Invalid core number, must be between %hu and %hu\n", CORE_ZERO_POS, numcores); return -1; } printf("Selecting socket %hu\n", m_socket); if ((iError = ai_mSetActiveCPU(mHandle, m_socket)) != AI_SUCCESS) { printf ("\nSetActiveCPU: ERROR: %s Socket %hu\n" , ai_ErrorToString(iError), m_socket); return 1; } ai_mSetActiveCore(mHandle, m_core); ai_mSetActiveThread(mHandle, THREAD_ZERO_POS); clock_gettime(CLOCK_MONOTONIC, &start_time); dumpmca(mHandle); clock_gettime(CLOCK_MONOTONIC, &end_time); secs = (double)(end_time.tv_sec - start_time.tv_sec) + (double)(end_time.tv_nsec - start_time.tv_nsec) / 1000000000.0; printf("Time for test: %7.2f seconds.\n", secs); printf("\nDone, exiting debug mode.\n"); iError = ai_mExitDebugMode(mHandle); //Attempt to leave gracefully if (iError != AI_SUCCESS) { printf ("\nError with Exit Debug Mode: %s\n" , ai_ErrorToString(iError)); } // call the close() library function ai_mClose(mHandle); return iError; } The code is pretty self-explanatory. I’ll hit the high points:
Early in the code you can see the arrays for the IA32_MCi CTL, STATUS, ADDR, MISC and CTL2 registers. These are the error reporting bank registers.
Skipping down to the main() routine, you see some basic initialization of the platform, done for most of these type of routines, invoking these functions:
ai_mOpen ai_mSetTargetCPUType ai_mConfig ai_mGetITPScanChainTopology ai_mSetActiveCPU ai_mSetActiveCore ai_mSetActiveThread
After which point, the dumpmca routine is invoked. And it’s really quite simple: just a sequence of ReadMSR routines extracting and printing out the global control and error-reporting registers for the selected socket, core (thread) and bank.
You can see that, armed with the man pages, source code samples and documentation, it’s pretty straightforward to understand what’s going on. Simple, huh?
Want to know more? I’ll have more source code and how-to tips coming in a future blog. In the meantime, check out our SED eBook. Or drop me a note in the Comments section.