One of several buses I’ve been working on with the ScanWorks High-Speed I/O (HSIO) products is PCI Express (PCIe). We’ve had tools in ScanWorks to test PCIe in various ways since the Intel server chipset code named Twincastle back in the 2005 timeframe. And we’ve often run into issues with having robust enough support to handle any random endpoint that customers might choose for their system.
Sometimes we’ve found software bugs and sometimes we’ve found hardware or silicon problems, but the biggest problem always seems to be getting the endpoint into loopback mode properly.
Both the Bit-Error-Rate (BER) and margining tests are performed with the endpoint in slave loopback mode. This keeps us from requiring any special designed-in DFT features or access to the endpoint since loopback mode is specified in the PCIe specification from PCI-SIG. Entry into slave loopback is requested by the loopback master device during the training sequence. After loopback has been achieved, all data sent to the endpoint in loopback slave mode should be directly looped back from Rx to Tx and sent back to the loopback master. For PCIe Gen2, this was (somewhat loosely) defined in section 4.2.5.10 of the base specification, but it wasn’t used in any of the required compliance tests. For PCIe Gen3, the definition in the spec leaves less room for interpretation and loopback slave mode is used during compliance testing, giving us a much more standardized and reliable loopback implementation across vendors.
ASSET joined the PCI-SIG early last year to better understand the PCIe vendor’s world and help them understand how these issues with loopback slave mode are affecting their end customers. Since then we have attended several PCI-SIG Compliance Workshops, gathering data on loopback issues and, in many cases, discussing it directly with the engineers responsible for the PCIe implementation. During our first PCI-SIG workshop, we were able to test 22 PCIe cards using our ScanWorks HSIO for Intel Architecture (IA) tool. Unfortunately, only 6 of those 22 cards worked properly with this embedded test. At the last workshop we attended, we performed 32 tests and 19 worked properly. That’s 59% working vs. only 27% in our first workshop just a year ago. In addition, we saw a lot more Gen3 devices in this last workshop and of these over 80% passed our test!
Validation of the PCIe bus has come a long way even in just the past year. Validating with embedded instruments is faster and less cumbersome than other methods, but relies on proper interpretation and implementation of key DFx features such as slave loopback for PCIe. I’m happy to say we’re seeing a significant shift in the industry toward a robust and standardized implementation with PCIe Gen3 validation.