- vmm.ko will create a vmspace for guest physical address mappings to host memory. - Each contiguous memory segment in the guest physical address space will be represented by an object of type OBJT_DEFAULT in the host. - When the guest traps into the hypervisor for a EPT violation we call vm_fault(guest_vmspace->vm_map, gpa, ftype, VM_FAULT_NORMAL); - Identify the page tables in the pmap: enum pmap_type { PT_X86 = 0, /* regular x86 page tables */ PT_EPT, /* Intel' nested page tables */ PT_RVI, /* AMD's nested page tables */ } struct pmap { ... enum pmap_type pm_type; /* regular or nested tables */ long pm_eptgen; /* EPT pmap generation id */ ... } - vmspace_alloc() will by default create a pmap using 'pmap_pinit()'. Nested pmaps will be created by passing a non-NULL pmap_pinit_t function pointer to 'vmspace_alloc()'. For e.g. ept_pinit() is used to create a PT_EPT pmap. - A virtual machine will allocate a pmap with the appropriate pmap_type. A vcpu will will have the appropriate hostcpu set in 'pmap->pm_active' when it is active. - The kernel address space mappings (e.g. recursive, kva, direct map) will only be installed for pmap_type = PT_X86 in pmap_pinit(). These mappings make no sense in a nested page table. - Each vcpu will maintain a cache of the 'pmap->pm_eptgen' and will invalidate the EPT mappings in the TLB if its copy if out-of-date. pmap_invalidate_page(): if (pmap->pm_type == PT_EPT) { pmap->pm_eptgen++; ipi_selected(pmap->pm_active, IPI_JUSTRETURN); /* force vm exit */ } pmap_invalidate_range() and pmap_invalidate_all() will do the same thing. VM entry pseudo-code: intr_disable(); pmap->pm_active |= (1 << curcpu); if (vcpu->eptgen != pmap->pm_eptgen) { vcpu->eptgen = pmap->pm_eptgen invept(vm->ept); } vmresume - Guest physical addresses will range between [0, VM_MAXUSER_ADDRESS). This means that from the point of view of the pmap these addresses can be treated the same as the UVA for a process. - PTE bits: bit Host (PT_X86) Guest (PT_EPT) 0 PG_V EPT_PG_RD 1 PG_RW EPT_PG_WR 2 PG_U EPT_PG_EX 3 PG_NC_PWT EPT_PG_MEMORY_TYPE[0] Must be 0 for non-terminal mappings 4 PG_NC_PCD EPT_PG_MEMORY_TYPE[1] Must be 0 for non-terminal mappings 5 PG_A EPT_PG_MEMORY_TYPE[2] Must be 0 for non-terminal mappings 6 PG_M EPT_PG_IGNORE_PAT Must be 0 for non-terminal mappings 7 PG_PS EPT_PG_SUPERPAGE(superpage entry) PG_PTE_PAT ignored (regular page entry) 8 PG_G EPT_PG_ACCESSED optional 9 PG_AVAIL1 EPT_PG_MODIFIED optional, only on terminal mappings 10 PG_AVAIL2 Available for software 11 PG_AVAIL3 Available for software 12 PG_PDE_PAT Must be 0 (Superpage only) 63 PG_NX Ignored PG_V and PG_RW map cleanly to EPT_PG_RD and EPT_PG_WR respectively. PG_U maps to EPT_PG_EX which works fortuitously since the guest physical addresses are coincident with the user virtual addresses. The cache mode bits are different but that can be rectified by adding a 'pmap_t' argument to pmap_cache_bits(). PG_W currently uses PG_AVAIL1 but that conflicts with EPT_PG_MODIFIED. Therefore PG_W should be changed to use PG_AVAIL3 instead. Replace the macros PG_A, PG_M and PG_G to _X86_PG_A, _X86_PG_M and _X86_PG_G respectively. These bits will no longer be macros but variables that will be initialized at runtime. For e.g. int some_pmap_func() { int PG_A = pmap_accessed_bit(pmap); int PG_M = pmap_modified_bit(pmap); int PG_G = pmap_global_bit(pmap); int PG_PTE_PROMOTE = pmap_pte_promote_bits(pmap); } - Older Intel processors (pre-Haswell) do not support the accessed/dirty bits in the EPT. We need to do software emulation of the A/D bits when executing guest on these processors. More details are available here: http://people.freebsd.org/~neel/bhyve/accessed_dirty_emulation.txt