Last week, I suspected that I might be seeing segmentation fault failures on my new AMD Ryzen 7 1700X computer. I dug into this some more this week, and learned a lot!
Iโm pretty conservative when it comes to calling suppliers with problems regarding my electronics at home. I tend to want to dig into the issue and try to figure it out myself, often spending hours in the process. Some might call this a waste of time, but I often learn a lot in the process. And as an outcome, I really know what Iโm talking about, when it comes time to call Tech Support. My wife thinks that this inclination is related to my refusal to ask for directions when Iโm driving. Fortunately, in this era of Google Maps and built-in navigation systems, the latter is no longer an issue.
So, when my new AMD-based PC started throwing segmentation faults during my Yocto Linux builds (see Episode 26), I figured I should dig into it a little bit first. As I was tinkering around, I got a notification on my Debian 9.1 home page that a new release was available. I clicked on the โUpdatesโ button, and soon enough, I now had Debian 9.2 on-board.
Also interestingly, about mid-week last week, I noticed that documentation for Yocto has been updated for Yocto: http://www.yoctoproject.org/docs/2.4/yocto-project-qs/yocto-project-qs.html has been updated to version 2.4 (โRockoโ), while before I was using version 2.3 (โPyroโ). So, I had to do a little work to get onto the new update.
I then jumped in with both feet, and did a build for the Portwell Neptune Alpha, and it succeeded! And I no longer got the warnings about Debian incompatibility, so between the jump to Rocko and the update to Debian 9.2, that somehow resolved itself. Very encouraging!
Emboldened, I backed up and then did a build for QEMU (Quick Emulator). But, it crashed with a segmentation fault!
Finished binary package job, result 0, filename /home/alan/poky/build/tmp/work/i586-poky-linux/gcc-runtime/7.2.0-r0/deploy-rpms/i586/libssp-dev-6.2.0-r0.i586.rpm
Segmentation fault
WARNING: exit code 139 from a shell command
DEBUG: Python function do_package_rpm finished
DEBUG: Python function do_package_write_rpm finished
: Function failed: BUILDSPEC (log file is located at /home/ala/poky/build/tmp/work/i586-poky-linux/gcc-runtime/7.2.0-r0/temp/log.do_package_write_rpm.30372
I then ran several different builds, for QEMU, the MinnowBoard, and the Neptune Alpha; and sometimes it would fail, and sometimes succeed. But mostly it would fail. So, it was time to get more rigorous on this. Having read the articles at Ryzen Is Running Solid Under Linux, No Compiler Segmentation Fault Issue and about the Kill Ryzen script, I began to suspect that maybe there was something wrong with my CPU, and it was an older model. So I used Git to download the Kill Ryzen script, and ran it:
Yes, it crashed after five minutes. And this happened repeatedly.
But the screen shot didnโt say why it crashed. I found from the Kill-Ryzen script README.md that I had to go into the /mnt/ramdisk/workdir/buildloop.d/loop-6/build.log to see the details behind the failure (note that the โ6โ comes from the signified โloop-6โ failure in the screenshot above).
And the failure logged was, indeed, a segmentation fault, as can be seen from the highlighted line below of the last lines in the log:
checking for suffix of executables…
checking for suffix of object files… o
checking whether we are using the GNU C compiler… yes
checking whether gcc accepts -g… /bin/bash: line 22: 2529 Segmentation fault /bin/bash $s/$module_srcdir/configure –srcdir=${topdir}/$module_srcdir –cache-file=./config.cache '–disable-multilib' '–enable-languages=c,c++,fortran,lto,objc' –program-transform-name='s,y,y,' –disable-option-checking –build=x86_64-pc-linux-gnu –host=x86_64-pc-linux-gnu –target=x86_64-pc-linux-gnu –disable-intermodule –enable-checking=yes,types –disable-coverage –enable-languages="c,c++,lto" –disable-build-format-warnings
Makefile:12563: recipe for target 'configure-stage1-libdecnumber' failed
make[2]: *** [configure-stage1-libdecnumber] Error 139
make[2]: Leaving directory '/mnt/ramdisk/workdir/buildloop.d/loop-6'
Makefile:27079: recipe for target 'stage1-bubble' failed
make[1]: *** [stage1-bubble] Error 2
make[1]: Leaving directory '/mnt/ramdisk/workdir/buildloop.d/loop-6'
Makefile:941: recipe for target 'all' failed
make: *** [all] Error 2
So, it is time to contact AMD. I placed a ticket in their online support system. Letโs keep our fingers crossed!
Why am I doing all this? Well, partly itโs a public service, as Iโm doing a lot of Linux builds as I explore OpenBMC for the ASPEED AST2500 for our ScanWorks for Embedded Diagnostics product line. In particular, Iโm interested in applying boundary-scan test technology on Intel-based servers using the ASPEED BMC. You can read more about the power of in-situ JTAG-based boundary-scan test in our eBook, Embedded JTAG for Boundary-Scan Test (note: requires registration).