Debugging, Reverse Engineering, and Malware Research – The Value of Better Tools: Part 2

In Part 1 of this series, I wrote about the difference between using printf, and using advanced tools for debugging. In this edition, I’ll jump into the real value of debuggers and the return on investment of using better tools.

In Debugging, Reverse Engineering, and Malware Research – The Value of Better Tools: Part 1, I referred to the famous interview with John Carmack, lead programmer for Wolfenstein 3D, Doom, Quake, and many other AAA games, at YouTube here.

John expressed surprise at the number of engineers who relied on archaic means of debugging their code.

Below I’ve paraphrased some of the most compelling statements in the interview:

Big companies can learn something from the hardcore game development side of things…

It’s amazing how hostile some large companies are to debuggers and IDEs.

The debugger is the way you gain insight into something that is just too big to understand.

You can still get some things done if you’re working with stone knives and bearskins…

It’s amazing what you can do with better tools.

Don’t try to run the code in your head. Your head is a faulty interpreter.

The way John delivers these pithy statements in the interview, you just need to laugh. I enjoyed his last line. The video is certainly a classic. And from some of the social media responses that I’ve had to my last article, a lot of people resonated with the interview, even though it was first published two years ago.

Although I no longer program for a living, I know that John’s philosophy is true. When I graduated from college years ago (I won’t say how many years), I worked on a game for the Apple II, which was ultimately ~ 50,000 lines of 6502 assembly language. I used a debugger (was it Merlin? It was so long ago) to do basic things like set breakpoints, single step through the code, and examine register contents. I just took it for granted that I should use a debugger: how would you possibly debug something that big without one?

And later in life, I took up game programming again, this time as a hobby, and on Windows with DirectX. Development environments and debugging had come a long way. Two tremendous reference resources I had at my fingertips were the books:

Game Engine Architecture, Third Edition, by James Gregory

3D Game Programming with DirectX12, by Frank D. Luna

Here’s a short video of a demo program that I built using the above as references:

I will promise you, if you’re doing game development, you need a debugger. The same goes for any large-scale project, let alone a hypervisor or Windows.

Getting back to the topic at hand, why should management approve an engineer’s request for a debugger? If you scour the web, you won’t find too much help there: there are few studies on the return on investment for this, and managers often demand it before approving a purchase. I hope that the below hypothetical example will help. Note that your own mileage will vary, depending on what you’re trying to do, but here’s a reference example. Here, we’ll look at it from the perspective of debugging Windows, but in the most general case it applies everywhere.

 

The Summary:

Hypervisor and operating system interactions with the silicon present the most complex and challenging technical bugs in the world.

Further, the increasing complexity and security requirements for Windows has an impact on everyone’s ability to root-cause issues faster, enhance engineering productivity, and accelerate velocity-to-market for new features.

These issues can be resolved by adopting debugging technology that delivers groundbreaking functionality, so that bugs and vulnerabilities are found and resolved quickly. Let’s look at the top challenges of debugging today, assess the current situation, identify the needs, and describe the impacts of having better tooling:

Challenge #1: Debug the Undebuggable

Current Situation

Current debuggers cannot easily debug interactions with the firmware/silicon.

Identified Need

Advanced debugging capability for trap handlers, hypervisor transition code, KVA shadow, NMI, MCE, Hyper-V, Secure Kernel, VBS enclaves, Secure Boot, Shadow Stacks, and many others.

The Impacts

Catch “escapes” that manifest themselves in the field.

Reduce debugging time.

Challenge #2: Identify Vulnerabilities

Current Situation

Limited ability to analyze HV/SK/OS and malware behavior with comprehensive code coverage.

Identified Need

Complement static analysis with dynamic analysis using Intel Processor Trace, Architectural Event Trace (AET), LBR, etc.

The Impacts

Expand code coverage.

Visualize code behavior.

Insert faults, discover bugs.

Challenge #3: Gain Insight

Current Situation

Steep learning curve, “tribal knowledge” for insight into Windows’ internals.

Identified Need

Deepen knowledge base of your team with powerful debugging tools with high ease-of-use.

The Impacts

Exponential reduction in team onboarding cost.

Debug everyday issues in a fraction of the time.

 

What is the ROI of adopting a better debugger? Let’s look at the cost side of the equation:

Assume that there are 10 engineers who need access to a best-in-class debugger.

For SourcePoint WinDbg, each engineer will need an AAEON UP Xtreme i11 or similar hardware target; let’s peg the cost at about $500 each. So, this cost is $5,000.

And the software license has a cost: let’s pick a hypothetical number of $8,000 for this. So, ten of these licenses has a cost of $80,000.

There’s “human” cost needed to learn a new tool, and adopt any new solution. Ten people learning something new has a cost that needs to be factored into any ROI analysis. I’m guessing here, but let’s just say worst case it’s as much as the license cost: $80,000.

So, the grand total cost over one year is $165,000.

What are the benefits? To keep it simple, let’s presume that the team of ten deals with one critical bug per month. We’ll estimate a fully loaded labor rate for a top-skilled engineer cost of $200/hour (note that this includes not just their base salary but also benefits, taxes, and other employment-related costs). And each critical bug uses up one man-month of time. And that with better tools we can resolve that issue in one day rather than one month. The cost of these critical bugs in one year is 1 bug/month X 12 months/year X $200/hour X 8 work hours/day X 20 work days/month = $384,000.

To summarize:

Capital cost (up front) = $165,000

Cost savings (per year) = $384,000

So, it is worth it? Should your company spend $165,000 to save $384,000 in one year?

Experienced finance people will recognize that the above analysis is quite simple – perhaps overly so. Software costs are sometimes amortized over a period of time, say three years. And a Net Present Value, Discounted Cash Flow analysis will also change the results a little bit. But, in both cases it works in favor of doing the investment. I speak from experience. Maybe I’ll do that analysis via spreadsheet sometime in the future.

This is a rough estimate, but can certainly be tailored to your situation. Only have five critical bugs per year? Change the model. Do some critical bugs take longer, even up to six months to resolve? Or maybe you never fix them? Change the model. Run the model for a three-year or five-year period. See the results.

It’s pretty clear that having better tools is of great value. Sometimes it’s worthwhile just seeing it in black and white.

This is of course only the cost side of the situation. There’s also the value of accelerating time-to-market: that is, if you solve the bugs sooner, you get your product to market faster, and that has tremendous value from a positive net cash flow perspective. And imagine the benefit of reducing field returns due to latent undetected or unresolved bugs. I’ll write about that next time.

Are you doing Windows application development, reverse engineering, hypervisor development, or are you just curious about Windows’ internals? Check out our cost-effective Home and Enterprise editions of SourcePoint WinDbg.