Currently, the PAT changes consist of four independent patches. 1) pat_mmap_single.patch This adds a new cdevsw routine d_mmap_single() which gets called to fill an entire mmap() request for a character device. It is an optional routine and if it is not present or returns ENODEV, then the mmap() request will fall back to using the device pager and d_mmap(). One can use d_mmap_single() to validate a request and return ENODEV to have it still be backed by the device pager. However, its intended use is to "claim" a device mmap() request and redirect it to a different VM object that is not the device's device pager object. In this case, d_mmap_single() should return a reference to the desired VM object. It may also wish to adjust the starting offset of the mapping relative to the desired VM object. 2) pat_cache_mode.patch This patch adds caching mode support to the MI VM layer. What I have done is to add a new field to each vm_map_entry that includes the cache mode for that mapping range. It is treated very similar to protection (VM_PROT_*) in that each entry has a cache mode and adjacent VM map entries may only be coalesced if they have the same cache mode. The actual cache mode is stored in a MI typedef of a uchar. However, the valid values for the cache mode are defined by each architecture in . Drivers can use #ifdef's to see if a specific cache mode is supported at compile time. Each architecture is required to support VM_CACHE_DEFAULT at a minimum. It would probably be best for architectures to use the same constant names to describe the same effective mode. I also added a VM cache mode to each VM object. It defaults to VM_CACHE_DEFAULT. When an object is inserted into a map, the object's cache mode is used. None of this attempts to solve the problem of multiple mappings of the same page with different cache modes. I'm not sure that is something we should solve however. It may also be that some other arch may not have that requirement some day (or at the least, it may not make a lot of sense in e.g., MIPS where you have direct maps in hardware for both WB and UC). What this does right now is punt and require the driver to get this correct. However, if the driver is careful to ensure that all the mappings are done via a VM object it controls, then it can use that to ensure that all mappings use the same cache mode. Note that this does require changes to the pmap in that a few routines that create physical mappings now accept a cache mode parameter. I have updated amd64 and i386 but I have only runtested amd64. Other archs can simply #define only VM_CACHE_DEFAULT and ignore the cache mode paramters for now. I also have additional patches to add a mcache() system call similar to mprotect() that changes the cache mode on a range. I have not included this in this patch as I'm not sure it is useful (it has a high foot-shooting potential and I believe it is not needed for DRM/Nvidia). I also have not implemented the pmap routine it needs on amd64 or i386. 3) pat_sg.patch In addition to this patch, one needs the new files under this tree at kern/subr_sglist.c, sys/sglist.h, and vm/sg_pager.c. sglist(9) is a new data type used to describe a scatter/gather list of physical memory ranges. I originally developed it for the unmapped buffer I/O project which is why it has a bit of a rich API. On top of this I created a new VM object type and pager: OBJT_SG. This pager is very much like the device pager. However, instead of calling d_mmap() to determine the physical address of a given page in the VM object, the physical address is looked up using the scatter/gather list. Note that scatter/gather lists are immutable after they have been created (similar to credential structures in the kernel). These objects can be useful to export physical address ranges like BARs, etc. 4) pat_mmap_prefault.patch This adds two new flags to mmap(): MAP_PREFAULT_READ and MAP_PREFAULT_WRITE. If either of these flags is set for an mmap() request, then the pages will be prefaulted using vm_fault() before mmap() returns. If MAP_PREFAULT_WRITE is set, then the pages will be prefaulted for read/write as dirty pages. Otherwise the pages will be prefaulted for read. A small test demo is available at modules/patdev/. It creates a /dev/patdev device that implements mmap() using d_mmap_single(). It exports two different VM objects and uses the offset passed to the mmap() system call to determine which object is exported. For requests with an offset of 0, a shared anonymous object is used to satisfy mapping requests. The object is created on the first mmap() request and its size is set to the size passed in to the mmap() call. It is mapped WC. Subsequent mmap()'s at offset 0 will all share this same bit of anonymous memory. I do not demonstrate doing DMA from this region. One would need to wire the pages first. That could be done by something like this: vm_ooffset_t foff; vm_offset_t kva, ofs; vm_object_t obj; vm_size_t size; int rv; foff = starting_offset_to_map(); size = range_to_map(); obj = my_vm_object(); /* Map the object into the kernel_map. */ vm_object_reference(obj); kva = vm_map_pin(kernel_map); ofs = foff & PAGE_SIZE; foff = trunc_page(foff); size = round_page(size + ofs); rv = vm_map_find(kernel_map, obj, foff, &kva, size, TRUE, VM_PROT_READ | VM_PROT_WRITE, VM_PROT_READ | VM_PROT_WRITE, 0); if (rv != KERN_SUCCESS) { vm_object_deallocate(obj); /* handle error */ } /* Wire this mapping. */ rv = vm_map_wire(kernel_map, kva, kva + size, VM_MAP_WIRE_SYSTEM | VM_MAP_WIRE_NOHOLES); if (rv != KERN_SUCCESS) { vm_map_remove(kernel_map, kva, kva + size); /* handle error */ } bus_dmamap_load(..., kva, size, ...); Later the buffer can be unmapped and unwired once the DMA is finished using vm_map_remove: vm_map_remove(kernel_map, kva, kva + size); The second object that the test device exports is a scatter/gather object (OBJT_SG). When the module is loaded, it creates a scatter/gather list with a single entry that covers the local APIC. It then creates a VM object using that list and sets its cache mode to UC. mmap() requests that have a starting offset of PAGE_SIZE use this object. Note that I set the starting offset of the internal mapping request to 0 in my d_mmap_single() handler so that the resulting mapping starts at the beginning of the VM object. In this case, the effect is that doing: fd = open("/dev/pat", O_RDWR); r = mmap(0, getpagesize(), PROT_READ, MAP_SHARED, fd, getpagesize()); Actually maps the local APIC into a process' address space at 'r'.