It’s been a few months since I did any work with my MinnowBoard; time seems to fly by during the summer. In this episode, I pick up where I left off, doing various Yocto builds as I worked towards source-level debug of the Linux kernel. But, I’m having mysterious build failures, with some of the segmentation fault symptoms I had months ago before I RMA’ed my AMD CPU. Could this be raising its ugly head again?
As described in the MNW Chronicles Episodes 32, 33, 34 and 35, I had gotten fairly proficient at building various types of Linux images, and the problems I ran into in Episode 28: Returning my AMD Ryzen 7 1700X CPU seemed in the distant past. If you recall, I had received a very early sample of the new CPU from AMD, and after doing some research, I realized that this chip was susceptible to segmentation faults under heavy load. And, you don’t get a load much heavier than running all 16 threads simultaneously in a Yocto build. Fortunately, this had a happy ending – I returned my CPU to AMD, and they promptly and courteously supplied me with a new one. And the new chip ran like a champ, building Yocto images faster than I could imagine.
Well, I put my Yocto image builds to the side for a while, enjoying the summer, and a few things have happened since then. I did a major BIOS update (supplied by ASUS, who gets it in turn from AMI), presumably to address the Meltdown/Spectre vulnerabilities. I also managed to totally wipe out the main Ubuntu 16.04.5 LTS VM that I was using under VirtualBox, so I had to recreate that. But, when I tried to run a new Ubuntu VM, I got the below error:
It took some digging, but I finally realized that somehow the BIOS update had disabled virtualization (AMD-V). It was necessary to go into the BIOS setup menu, go to Processor Configuration, and then change SVM to Enabled:
That did the trick! I was back in business.
But, after multiple, multiple tries, I could never get a successful build done. The builds would always start successfully, and execute hundreds, if not thousands of tasks. But when I came back to check on the status of the build, the majority of the time the PC just rebooted. I couldn’t find many useful breadcrumbs to help me diagnose the problem in these instances. A few times, the build would just error out without rebooting, and I could see a segfault:
Circled in red is a “internal compiler error: Segmentation fault” error message.
I was able to use the “dmesg” command to get a printout of the last kernel error messages prior to the failure:
On a number of occasions, I would try to relaunch the bitbake, and it would continue for a while (a few hundred tasks), but then fail again. I’d come back to see the Ubuntu login screen.
Sometimes it would fail saying that “only one copy of bitbake should be run against a build directory”, as you can see in previous image. I’m guessing that the segfault has left the build in an inconsistent state. There is a bitbake “lock” file in the poky/build/tmp directory that is probably stopping the new build from launching:
What’s going on? I’m guessing at this point that some incompatibility has been introduced in the last few months, that is causing the build to be unstable. On the other hand, a segfault would seem to be symptomatic of a hardware or firmware failure; maybe a flaky DIMM, or maybe my CPU isn’t seated quite right in the socket, or any of a number of different things. I think that the first thing I need to do is to identify if the problem is hardware or software, by going back to the “Kill-Ryzen” script I ran months ago, that first identified the problem. Kill-Ryzen stresses the system with a highly parallel workload, and should crash my PC if there’s a basic hardware fault unrelated to the actual Yocto build process. Will this identify the problem? Stay tuned!
Want to backtrack and read some of the earlier MinnowBoard Chronicles (or, at least, Episodes 1-31) in one PDF? If so, click on this link (note: requires registration).