The MinnowBoard Chronicles Episode 27: Segfault on my AMD Ryzen 7 1700X

Last week, I suspected that I might be seeing segmentation fault failures on my new AMD Ryzen 7 1700X computer. I dug into this some more this week, and learned a lot!

Iโ€™m pretty conservative when it comes to calling suppliers with problems regarding my electronics at home. I tend to want to dig into the issue and try to figure it out myself, often spending hours in the process. Some might call this a waste of time, but I often learn a lot in the process. And as an outcome, I really know what Iโ€™m talking about, when it comes time to call Tech Support. My wife thinks that this inclination is related to my refusal to ask for directions when Iโ€™m driving. Fortunately, in this era of Google Maps and built-in navigation systems, the latter is no longer an issue.

So, when my new AMD-based PC started throwing segmentation faults during my Yocto Linux builds (see Episode 26), I figured I should dig into it a little bit first. As I was tinkering around, I got a notification on my Debian 9.1 home page that a new release was available. I clicked on the โ€œUpdatesโ€ button, and soon enough, I now had Debian 9.2 on-board.

Also interestingly, about mid-week last week, I noticed that documentation for Yocto has been updated for Yocto: http://www.yoctoproject.org/docs/2.4/yocto-project-qs/yocto-project-qs.html has been updated to version 2.4 (โ€œRockoโ€), while before I was using version 2.3 (โ€œPyroโ€). So, I had to do a little work to get onto the new update.

I then jumped in with both feet, and did a build for the Portwell Neptune Alpha, and it succeeded! And I no longer got the warnings about Debian incompatibility, so between the jump to Rocko and the update to Debian 9.2, that somehow resolved itself. Very encouraging!

Emboldened, I backed up and then did a build for QEMU (Quick Emulator). But, it crashed with a segmentation fault!

Finished binary package job, result 0, filename /home/alan/poky/build/tmp/work/i586-poky-linux/gcc-runtime/7.2.0-r0/deploy-rpms/i586/libssp-dev-6.2.0-r0.i586.rpm

Segmentation fault

WARNING: exit code 139 from a shell command

DEBUG: Python function do_package_rpm finished

DEBUG: Python function do_package_write_rpm finished

: Function failed: BUILDSPEC (log file is located at /home/ala/poky/build/tmp/work/i586-poky-linux/gcc-runtime/7.2.0-r0/temp/log.do_package_write_rpm.30372

I then ran several different builds, for QEMU, the MinnowBoard, and the Neptune Alpha; and sometimes it would fail, and sometimes succeed. But mostly it would fail. So, it was time to get more rigorous on this. Having read the articles at Ryzen Is Running Solid Under Linux, No Compiler Segmentation Fault Issue and about the Kill Ryzen script, I began to suspect that maybe there was something wrong with my CPU, and it was an older model. So I used Git to download the Kill Ryzen script, and ran it:

Kill Ryzen screen shot

Yes, it crashed after five minutes. And this happened repeatedly.

But the screen shot didnโ€™t say why it crashed. I found from the Kill-Ryzen script README.md that I had to go into the /mnt/ramdisk/workdir/buildloop.d/loop-6/build.log to see the details behind the failure (note that the โ€œ6โ€ comes from the signified โ€œloop-6โ€ failure in the screenshot above).

And the failure logged was, indeed, a segmentation fault, as can be seen from the highlighted line below of the last lines in the log:

checking for suffix of executables…

checking for suffix of object files… o

checking whether we are using the GNU C compiler… yes

checking whether gcc accepts -g… /bin/bash: line 22:  2529 Segmentation fault      /bin/bash $s/$module_srcdir/configure –srcdir=${topdir}/$module_srcdir –cache-file=./config.cache '–disable-multilib' '–enable-languages=c,c++,fortran,lto,objc' –program-transform-name='s,y,y,' –disable-option-checking –build=x86_64-pc-linux-gnu –host=x86_64-pc-linux-gnu –target=x86_64-pc-linux-gnu –disable-intermodule –enable-checking=yes,types –disable-coverage –enable-languages="c,c++,lto" –disable-build-format-warnings

Makefile:12563: recipe for target 'configure-stage1-libdecnumber' failed

make[2]: *** [configure-stage1-libdecnumber] Error 139

make[2]: Leaving directory '/mnt/ramdisk/workdir/buildloop.d/loop-6'

Makefile:27079: recipe for target 'stage1-bubble' failed

make[1]: *** [stage1-bubble] Error 2

make[1]: Leaving directory '/mnt/ramdisk/workdir/buildloop.d/loop-6'

Makefile:941: recipe for target 'all' failed

make: *** [all] Error 2

So, it is time to contact AMD. I placed a ticket in their online support system. Letโ€™s keep our fingers crossed!

Why am I doing all this? Well, partly itโ€™s a public service, as Iโ€™m doing a lot of Linux builds as I explore OpenBMC for the ASPEED AST2500 for our ScanWorks for Embedded Diagnostics product line. In particular, Iโ€™m interested in applying boundary-scan test technology on Intel-based servers using the ASPEED BMC. You can read more about the power of in-situ JTAG-based boundary-scan test in our eBook, Embedded JTAG for Boundary-Scan Test (note: requires registration).