It has been quite an adventure over the last week. I’m getting intermittent segmentation faults during my Yocto Linux image builds. Could it be a problem with my new AMD Ryzen 7 1700X CPU?
In Episode 24, having built my new screamingly-fast AMD Ryzen 7 1700X based machine, I used Yocto to successfully build a new QEMU image in record time. But in last week’s Episode 25, I had a mixed bag of results. I did successfully build a Yocto image for my MinnowBoard, but unfortunately it failed to boot on my hardware. And when I tried to build a Yocto image for the Portwell Neptune Alpha, it failed.
Last week, I presumed that the source of the problems was that I was building the images using Debian 9.1. I would always get the following message right after the bitbake started:
WARNING: Host distribution "Debian-9.1" has not been validated with this version of the build system; you may possibly experience unexpected failures. It is recommended that you use a tested distribution.
So, I proceeded to somewhat haphazardly try to troubleshoot this, by first trying to re-install an earlier version of Debian on my build machine. I rationalized this by remembering that when I was doing builds under Virtualbox on my old PC, I was running off of Debian 8.2, and those worked. So, I tried that first.
Alas, Debian 8.2 refused to install on my new machine. I tried the same thing with the most current “obsolete stable” release of Debian 8 (“jessie”), 8.9. I got the same error message at the beginning of each install, and it just hangs:
core perfctr but no constraints; unknown hardware!
I’m guessing here that these older version of Debian won’t work with the Ryzen 7 chip; the last release of jessie was dated July 22, 2017.
So, it’s back to Debian 9.1. At least I know that I can successfully install that version on my PC. Eventually, the Yocto project will do some testing on this release, and do some updates.
But, this time, when I tried to do the QEMU image build, it crashed!
This time (and thanks to my colleague, Adam Ley, for reminding me), I went into the log file at:
~/poky/build/tmp/work/x86_64-linux/qemu-native/2.8.0-r0/temp/log.do_compile.23003
and saw this error message:
/home/alan/poky/build/tmp/work/x86_64-linux/qemu-native/2.8.0-r0/qemu-2.8.0/tcg/tcg.c:2800:12: internal compiler error: Segmentation fault
The “Segmentation fault” error got my attention. What’s that all about? I happened to google this topic, and saw the article at New Ryzen Is Running Solid Under Linux, No Compiler Segmentation Fault Issue. These segmentation fault issues seemed to happen on earlier Ryzen chips, under heavy loads such as Linux compiles.
Could I have possibly received an older part (manufactured prior to Week 25) that exhibits this fault under very special conditions? I’m going to do some more testing and see if this was a coincidence, or happens repeatedly. I’ve read that there’s a “Kill Ryzen” script that can manifest the issue. If this is my situation, it’s reassuring to know that AMD has an RMA process for this issue.
My end goal, of course, is to have my build platform rock-solid, so I can build images for the Portwell Neptune Alpha board. This target is a development vehicle for OpenBMC, and supports the ASPEED AST2500 BMC, the most common service processor on cloud computing servers. Our ScanWorks Embedded Diagnostics team is using this board for its in-house development.