JTAG debug of Windows Hyper-V / Secure Kernel with WinDbg and EXDI: Part 5

In the last couple of articles in this series, I’ve focused on basic run-control debugging used in conjunction with Intel Processor Trace (Intel PT). In this installment, we’ll start looking at the use of Architectural Event Trace (AET) to explore the Windows hypervisor, and how MSR accesses in particular are handled.

In the design of any hypervisor, performance and stability are extremely important: we want the Guest VM to handle as much of the workload as possible, and avoid unnecessary, expensive VM Exits. At the same time, Guest access to privileged instructions is restricted in the interest of security and isolation of the Guest.

As per the Intel SDM, Section 26.1.2, the following instructions unconditionally cause VM exits when they are executed in VMX non-root operation (that is, by the Guest):

CPUID, GETSEC, INVD, and XSETBV. This is also true of instructions introduced with VMX, which include: INVEPT, INVVPID, VMCALL, VMCLEAR, VMLAUNCH, VMPTRLD, VMPTRST, VMRESUME, VMXOFF, and VMXON.

And Section 26.1.3 contains a list of instructions that cause VM Exits conditionally. It’s a long list, and today we’ll focus on the RDMSR and WRMSR (Read MSR and Write MSR) instructions. As an excerpt out of Section 26.1.3:

RDMSR. The RDMSR instruction causes a VM exit if any of the following are true:

— The “use MSR bitmaps” VM-execution control is 0.

— The value of ECX is not in the ranges 00000000H – 00001FFFH and C0000000H – C0001FFFH.

— The value of ECX is in the range 00000000H – 00001FFFH and bit n in read bitmap for low MSRs is 1, where n is the value of ECX.

— The value of ECX is in the range C0000000H – C0001FFFH and bit n in read bitmap for high MSRs is 1, where n is the value of ECX & 00001FFFH.

Note: for a RDMSR, ECX contains the address of the MSR that is to be read from. The 64-bit output goes into the EDX and EAX registers, often written as EDX:EAX.

And the corresponding information for WRMSR is symmetrical.

This makes a fascinating topic for learning using SourcePoint WinDbg, as (1) we do have access to the VM-execution control information from within the VMCS, and (2) Intel AET can capture all RDMSR and WRMSR instructions as events (without halting the target), giving insight into values that are being read from or written to what MSRs, in what code context.

Firstly, let’s understand more about RDMSR and WRMSR instructions in a hypervisor context, and “MSR bitmaps”. For MSR reads and writes, it would certainly be possible to create a hypervisor that would trap every Guest invocation of these instructions, and let them be handled by the hypervisor. And this is indeed the default. But this would be inefficient and unstable. We want to avoid unnecessary VM Exits due to MSR accesses. This is accomplished with the use of the aforementioned MSR bitmaps. The following two slides, taken from Satoshi Tanda’s hypervisor course (thanks again, Satoshi, great course!) summarize how these bitmaps work:

 

Maybe this clarifies it a little:

  • The MSR bitmap is allocated 4kB of memory.
  • 2kB each is associated with MSR Reads and Writes.
  • For RDMSR, 1kB (8,192 bits) is allocated to MSR’s 0000 – 1FFF (and of course 0x1FFF = decimal 8,191).

The charts above present visually what the SDM is saying. You can look at the individual bits in the bitmaps to understand whether reading or writing a particular MSR will cause a trap or not. For a given MSR, if the bitmap bit is ‘0’, there won’t be a VM Exit. If it is ‘1’, reading or writing that MSR will cause a VM Exit. We’ll see what that looks like below with SourcePoint WinDbg.

Secondly, it’s worthwhile knowing a little more about AET before proceeding. In my opinion, AET is one of the best debug utilities on Intel platforms, in fact on any platform. Intel Processor Trace is great for watching code flow, but AET truly complements it by showing the other side of the coin: what events are happening on the platform. Both traces are timestamped, and when used together (it can be challenging, as you can read in my blog here: WinDbg with correlated timestamps for Event and Instruction Trace) tremendous insight into the Windows internals is possible. For more information on AET, check out my article Intel Architectural Event Trace (AET) in action, or for a longer, more detailed treatment, watch the webinar I did with the UEFI Forum here: https://www.youtube.com/watch?v=pHSvcO0ogdc.

AET can only be activated by placing the target into probe mode. That is, JTAG is a pre-requisite to using AET. Thus, an EXDI connection to a JTAG-based debugger, such as SourcePoint WinDbg, is the only mechanism to do x86 architectural low-level event tracing within Windows.

Given those, let’s jump in again! The other blogs in this series make for good background reading:

Part 1: VMM Breakpoint support

Part 2: The Secure Kernel with Symbols

Part 3: LBR and Intel PT in the Secure Kernel

Part 4: The VMCS, and altering it to enable Intel PT

With that background, you’ll know that we can easily break into the hypervisor itself, hvix64, as well as the hvloader, securekernel (VTL1) and NTOS Guest (VTL 0). To show how easy it is to inspect the Windows internals, we halt the target using JTAG, and use our dump macro to dump selected VMCS fields while in Host mode:

Guest-state:
  RIP: FFFFF80749C64D70
  CR3: 0000000004600000
  IA32_DEBUG_CTL: 0000000000000000
  IA32_RTIT_CTL: 0000000000000000
  IA32_LBR_CTL: FFFFF80741F0A000
Host-state:
  Exception bitmap: 00060002
  I/O bitmap (0000-7fff) address: 0000000101403000
  I/O bitmap (8000-ffff) address: 0000000101404000
  MSR bitmap address: 000000010DC4D000
  EPT pointer: 00000001102F701E
  VPID: 0002
VM-execution:
  Pin-based: 0000003F
    B0: External-interrupt exiting: TRUE
  Processor-based primary: B6A06DFA
    B23: Move DR causes VM-exit: TRUE
    B24: Unconditional I/O exiting: FALSE
    B25: Use I/O bitmaps: TRUE
    B27: Monitor trap flag: FALSE
    B28: Use MSR bitmaps: TRUE
  Processor-based secondary: 001813AB
    B01: EPT enabled: TRUE
    B05: VPID enabled: TRUE
    B14: VMCS Shadowing: FALSE
    B19: Hide NR bit in Intel PT PIPs: TRUE
    B24: Intel PT uses Guest physical: FALSE
VM-entry:
  Primary: 000213FF
    B02: Load IA32_DEBUGCTL: TRUE
    B17: Conceal VM-entry from Intel PT: TRUE
    B18: Load IA32_RTIT_CTL: FALSE
    B21: Load Guest IA32_LBR_CTL: FALSE
  MSR load count: 00000000
VM-exit:
  Primary: 0103EFFF
    B02: Save IA32_DEBUGCTL: TRUE
    B24: Conceal VM-exit from Intel PT: TRUE
    B25: Clear IA32_RTIT_CTL: FALSE
    B26: Clear IA32_LBR_CTL: FALSE
  Secondary: 41F0A000
  MSR store count: 00000000
  MSR load count: 00000000

Note to interested readers: as this is currently beta software, we’re using a somewhat primitive method to dump the VMCS fields. The intent, for our next release, is to have the VMCS show up in our “Registers” window, making it much easier to view, understand and edit. An example of our Registers window is below:

There’s lots to unpack in the VMCS dump, but the highlighted portions above show us that we do have MSR bitmaps enabled on this platform (that makes a lot of sense! Microsoft, I presume, has carefully chosen and enabled the MSRs that will and will not cause a VM Exit). And, the physical address of the MSR bitmap is at 0x10DC4D000.

So, let’s use the Memory window to dump the MSR bitmap:

Most of the bitmap is FF, which means that most of the attempted MSR reads by the Guest will cause a trap to the hypervisor. But, you see, not all: let’s look at the base address of the bitmap, 0x10DC4D000, as an example, with the first 24 MSRs being governed by BC FF 7F, yielding:

MSR                 Description                       Bitmap Value  Will cause VM Exit?
MSR 0H (0)          IA32_P5_MC_ADDR (P5_MC_ADDR)      0             Won’t exit
MSR 1H (1)          IA32_P5_MC_TYPE (P5_MC_TYPE)      0             Won’t exit
MSR 6H (6)          IA32_MONITOR_FILTER_SIZE          0             Won’t exit
MSR 10H (16)        IA32_TIME_STAMP_COUNTER (TSC)     1             Will exit
MSR 17H (23)        IA32_PLATFORM_ID                  1             Will exit

A Guest would presumably not need to read IA32_PLATFORM_ID, so it makes sense that traps.

And in terms of the IA32_TIME_STAMP_COUNTER, this is virtualized by the hypervisor, and the SDM in Section 26.3, CHANGES TO INSTRUCTION BEHAVIOR IN VMX NON-ROOT OPERATION, says:

RDMSR. Section 26.1.3 identifies when executions of the RDMSR instruction cause VM exits. If such an execution causes neither a fault due to CPL > 0 nor a VM exit, the instruction’s behavior may be modified for certain values of ECX:

— If ECX contains 10H (indicating the IA32_TIME_STAMP_COUNTER MSR), the value returned by the instruction is determined by the setting of the “use TSC offsetting” VM-execution control:

  • If the control is 0, RDMSR operates normally, loading EAX:EDX with the value of the IA32_TIME_STAMP_COUNTER MSR.
  • If the control is 1, the value returned is determined by the setting of the “use TSC scaling” VM-execution control:

— If the control is 0, RDMSR loads EAX:EDX with the sum of the value of the IA32_TIME_STAMP_COUNTER MSR and the value of the TSC offset.

— If the control is 1, RDMSR first computes the product of the value of the IA32_TIME_STAMP_COUNTER MSR and the value of the TSC multiplier. It then shifts the value of the product right 48 bits and loads EAX:EDX with the sum of that shifted value and the value of the TSC offset.

The 1-setting of the “use TSC-offsetting” VM-execution control does not affect executions of RDMSR if ECX contains 6E0H (indicating the IA32_TSC_DEADLINE MSR). Such executions return the APIC-timer deadline relative to the actual timestamp counter without regard to the TSC offset.

Whew! This is worth some study on its own. I may one day write a light hypervisor of my own, so seeing how Hyper-V operates gives me a huge head start on my learning.