GENERIC HEAD from 2013-07-16 06:43:37 UTC, r253382M, vmcore.0 GDB: no debug ports present KDB: debugger backends: ddb KDB: current backend: ddb Copyright (c) 1992-2013 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 10.0-CURRENT #0 r253382M: Thu Jul 18 20:38:20 CEST 2013 pho@t2.osted.lan:/usr/src/sys/amd64/compile/KTR amd64 FreeBSD clang version 3.3 (tags/RELEASE_33/final 183502) 20130610 WARNING: WITNESS option enabled, expect reduced performance. WARNING: DIAGNOSTIC option enabled, expect reduced performance. CPU: Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz (1995.24-MHz K8-class CPU) Origin = "GenuineIntel" Id = 0x206d7 Family = 0x6 Model = 0x2d Stepping = 7 Features=0xbfebfbff Features2=0x1fbee3ff AMD Features=0x2c100800 AMD Features2=0x1 TSC: P-state invariant, performance statistics real memory = 68719476736 (65536 MB) avail memory = 32148455424 (30659 MB) : Trying to mount root from ufs:/dev/da0p2 [rw]... Setting hostuuid: 2bde2bde-f4e2-e111-aab2-001e6756b69b. Setting hostid: 0x0035ff86. Starting ddb. Entropy harvesting: interrupts ethernet point_to_point kickstart. Starting file system checks: /dev/da0p2: FILE SYSTEM CLEAN; SKIPPING CHECKS /dev/da0p2: clean, 91934814 free (67238 frags, 11483447 blocks, 0.1% fragmentation) /dev/da0p3: FILE SYSTEM CLEAN; SKIPPING CHECKS /dev/da0p3: clean, 24057773 free (309 frags, 3007183 blocks, 0.0% fragmentation) Mounting local file systems:. Writing entropy file:. Setting hostname: t2.osted.lan. Starting Network: lo0 igb0 igb1. lo0: flags=8049 metric 0 mtu 16384 options=600003 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3 inet 127.0.0.1 netmask 0xff000000 nd6 options=21 igb0: flags=8843 metric 0 mtu 1500 options=401bb ether 00:1e:67:56:b6:9b inet 192.168.1.109 netmask 0xffffff00 broadcast 192.168.1.255 inet6 fe80::21e:67ff:fe56:b69b%igb0 prefixlen 64 scopeid 0x1 nd6 options=29 media: Ethernet autoselect (100baseTX ) status: active igb1: flags=8c02 metric 0 mtu 1500 options=401bb ether 00:1e:67:56:b6:9c nd6 options=29 media: Ethernet autoselect status: no carrier Starting devd. Starting Network: igb1. igb1: flags=8c02 metric 0 mtu 1500 options=401bb ether 00:1e:67:56:b6:9c nd6 options=29 media: Ethernet autoselect status: no carrier Configuring keyboard: keymap. ums0: on usbus0 ums1: on usbus2 ums0: 18 buttons and [XYZT] coordinates ID=2 ums1: 3 buttons and [Z] coordinates ID=0 Starting ums0 moused. Expensive timeout(9) function: 0xffffffff80713a80(0xffffffff8159f0d0) 0.006343942 s Starting ums1 moused. add net default: gateway 192.168.1.1 add net fe80::: gateway ::1 add net ff02::: gateway ::1 add net ::ffff:0.0.0.0: gateway ::1 add net ::0.0.0.0: gateway ::1 ELF ldconfig path: /lib /usr/lib /usr/lib/compat /usr/local/lib 32-bit compatibility ldconfig path: /usr/lib32 Creating and/or trimming log files. Starting syslogd. No core dumps found. Starting rpcbind. NFS access cache time=60 rpc.umntall: 127.0.0.1: MOUNTPROG: RPC: Program not registered lock order reversal: 1st 0xffffff8785ebfd08 bufwait (bufwait) @ kern/vfs_bio.c:3061 2nd 0xfffffe0012c48400 dirhash (dirhash) @ ufs/ufs/ufs_dirhash.c:284 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xffffff881df96300 kdb_backtrace() at kdb_backtrace+0x39/frame 0xffffff881df963b0 witness_checkorder() at witness_checkorder+0xd4f/frame 0xffffff881df96440 _sx_xlock() at _sx_xlock+0x75/frame 0xffffff881df96480 ufsdirhash_add() at ufsdirhash_add+0x4c/frame 0xffffff881df964c0 ufs_direnter() at ufs_direnter+0x688/frame 0xffffff881df96580 ufs_mkdir() at ufs_mkdir+0x863/frame 0xffffff881df96780 VOP_MKDIR_APV() at VOP_MKDIR_APV+0x10e/frame 0xffffff881df967b0 kern_mkdirat() at kern_mkdirat+0x20e/frame 0xffffff881df969a0 amd64_syscall() at amd64_syscall+0x282/frame 0xffffff881df96ab0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xffffff881df96ab0 --- syscall (136, FreeBSD ELF64, sys_mkdir), rip = 0x800931dba, rsp = 0x7fffffffd788, rbp = 0x7fffffffdc70 --- Clearing /tmp (X related). Starting nfsuserd. Starting mountd. Starting nfsd. Updating motd:. Mounting late file systems:. Starting ntpd. Starting powerd. Configuring syscons: keymap blanktime. Performing sanity check on sshd configuration. Starting sshd. Starting cron. Local package initialization: backup watchdogd. Starting default mousedmoused: unable to open /dev/psm0: No such file or directory . Thu Jul 18 21:05:19 CEST 2013 FreeBSD/amd64 (t2.osted.lan) (console) login: Expensive timeout(9) function: 0xffffffff80713a80(0xffffffff8159f0d0) 0.012733896 s Jul 18 21:05:33 t2 su: pho to root on /dev/pts/1 20130718 21:05:39 all (1/1): marcus2.sh lock order reversal: 1st 0xfffffe0012834c98 ufs (ufs) @ kern/vfs_lookup.c:517 2nd 0xffffff8785f1c298 bufwait (bufwait) @ ufs/ffs/ffs_vnops.c:262 3rd 0xfffffe01e209a068 ufs (ufs) @ kern/vfs_subr.c:2099 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xffffff881e3e1cb0 kdb_backtrace() at kdb_backtrace+0x39/frame 0xffffff881e3e1d60 witness_checkorder() at witness_checkorder+0xd4f/frame 0xffffff881e3e1df0 __lockmgr_args() at __lockmgr_args+0x6f2/frame 0xffffff881e3e1f20 ffs_lock() at ffs_lock+0x92/frame 0xffffff881e3e1f70 VOP_LOCK1_APV() at VOP_LOCK1_APV+0xf5/frame 0xffffff881e3e1fa0 _vn_lock() at _vn_lock+0xc6/frame 0xffffff881e3e2010 vget() at vget+0x70/frame 0xffffff881e3e2060 vfs_hash_get() at vfs_hash_get+0xf5/frame 0xffffff881e3e20b0 ffs_vgetf() at ffs_vgetf+0x41/frame 0xffffff881e3e2140 softdep_sync_buf() at softdep_sync_buf+0x8fa/frame 0xffffff881e3e21f0 ffs_syncvnode() at ffs_syncvnode+0x258/frame 0xffffff881e3e2270 ffs_truncate() at ffs_truncate+0x5f3/frame 0xffffff881e3e2450 ufs_direnter() at ufs_direnter+0x891/frame 0xffffff881e3e2510 ufs_makeinode() at ufs_makeinode+0x573/frame 0xffffff881e3e26d0 VOP_CREATE_APV() at VOP_CREATE_APV+0x108/frame 0xffffff881e3e2700 vn_open_cred() at vn_open_cred+0x2f0/frame 0xffffff881e3e2850 kern_openat() at kern_openat+0x1f5/frame 0xffffff881e3e29a0 amd64_syscall() at amd64_syscall+0x282/frame 0xffffff881e3e2ab0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xffffff881e3e2ab0 --- syscall (5, FreeBSD ELF64, sys_open), rip = 0x800b4fb0a, rsp = 0x7fffffffd4d8, rbp = 0x7fffffffd580 --- lock order reversal: 1st 0xfffffe0012f1ea28 ufs (ufs) @ kern/vfs_mount.c:1237 2nd 0xfffffe0062b30a28 devfs (devfs) @ ufs/ffs/ffs_softdep.c:1870 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xffffff881e5ea3b0 kdb_backtrace() at kdb_backtrace+0x39/frame 0xffffff881e5ea460 witness_checkorder() at witness_checkorder+0xd4f/frame 0xffffff881e5ea4f0 __lockmgr_args() at __lockmgr_args+0x6f2/frame 0xffffff881e5ea620 vop_stdlock() at vop_stdlock+0x3c/frame 0xffffff881e5ea640 VOP_LOCK1_APV() at VOP_LOCK1_APV+0xf5/frame 0xffffff881e5ea670 _vn_lock() at _vn_lock+0xc6/frame 0xffffff881e5ea6e0 softdep_flushworklist() at softdep_flushworklist+0x70/frame 0xffffff881e5ea740 ffs_sync() at ffs_sync+0x29d/frame 0xffffff881e5ea810 dounmount() at dounmount+0x360/frame 0xffffff881e5ea890 sys_unmount() at sys_unmount+0x376/frame 0xffffff881e5ea9a0 amd64_syscall() at amd64_syscall+0x282/frame 0xffffff881e5eaab0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xffffff881e5eaab0 --- syscall (22, FreeBSD ELF64, sys_unmount), rip = 0x800885d7a, rsp = 0x7fffffffceb8, rbp = 0x7fffffffcfd0 --- 20130718 21:29:09 all (1/1): marcus2.sh 20130718 21:51:10 all (1/1): marcus2.sh 20130718 22:13:56 all (1/1): marcus2.sh 20130718 22:35:40 all (1/1): marcus2.sh 20130718 23:25:26 all (1/2): marcus2.sh 20130718 23:47:14 all (2/2): swap.sh 20130718 23:56:41 all (1/2): swap.sh 20130719 00:06:06 all (2/2): marcus2.sh 20130719 00:31:00 all (1/2): swap.sh 20130719 00:40:08 all (2/2): marcus2.sh 20130719 01:03:01 all (1/2): swap.sh 20130719 01:12:02 all (2/2): marcus2.sh 20130719 01:35:08 all (1/2): swap.sh Expensive timeout(9) function: 0xffffffff80b2a5d0(0) 0.402744239 s 20130719 01:45:14 all (2/2): marcus2.sh 20130719 02:07:49 all (1/2): marcus2.sh 20130719 02:29:24 all (2/2): swap.sh 20130719 02:38:27 all (1/2): swap.sh 20130719 02:47:41 all (2/2): marcus2.sh 20130719 03:11:13 all (1/2): marcus2.sh 20130719 03:35:00 all (2/2): swap.sh 20130719 03:52:04 all (1/2): swap.sh 20130719 04:01:59 all (2/2): marcus2.sh 20130719 04:25:50 all (1/2): swap.sh 20130719 04:35:21 all (2/2): marcus2.sh 20130719 04:56:47 all (1/2): marcus2.sh 20130719 05:18:39 all (2/2): swap.sh 20130719 05:28:25 all (1/2): swap.sh 20130719 05:38:16 all (2/2): marcus2.sh 20130719 05:59:40 all (1/2): marcus2.sh 20130719 06:21:29 all (2/2): swap.sh 20130719 06:30:46 all (1/2): swap.sh 20130719 06:35:43 all (1/213): nullfs.sh 20130719 06:38:55 all (2/213): symlink2.sh 20130719 06:39:03 all (3/213): syscall5.sh 20130719 07:09:26 all (4/213): linger4.sh 20130719 07:28:12 all (5/213): crossmp.sh 20130719 07:40:44 all (6/213): trim5.sh lock order reversal: 1st 0xfffffe0012f1ea28 ufs (ufs) @ kern/vfs_mount.c:1237 2nd 0xfffffe05cf1a7130 snaplk (snaplk) @ ufs/ffs/ffs_snapshot.c:2102 3rd 0xfffffe04f3e2b7b8 ufs (ufs) @ ufs/ffs/ffs_snapshot.c:2103 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xffffff881f7ec400 kdb_backtrace() at kdb_backtrace+0x39/frame 0xffffff881f7ec4b0 witness_checkorder() at witness_checkorder+0xd4f/frame 0xffffff881f7ec540 __lockmgr_args() at __lockmgr_args+0x6f2/frame 0xffffff881f7ec670 ffs_snapshot_unmount() at ffs_snapshot_unmount+0x10e/frame 0xffffff881f7ec6d0 ffs_flushfiles() at ffs_flushfiles+0xe7/frame 0xffffff881f7ec740 softdep_flushfiles() at softdep_flushfiles+0x17f/frame 0xffffff881f7ec7a0 ffs_unmount() at ffs_unmount+0x1a2/frame 0xffffff881f7ec810 dounmount() at dounmount+0x39e/frame 0xffffff881f7ec890 sys_unmount() at sys_unmount+0x376/frame 0xffffff881f7ec9a0 amd64_syscall() at amd64_syscall+0x282/frame 0xffffff881f7ecab0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xffffff881f7ecab0 --- syscall (22, FreeBSD ELF64, sys_unmount), rip = 0x800885d7a, rsp = 0x7fffffffce98, rbp = 0x7fffffffcfb0 --- 20130719 07:40:45 all (7/213): nullfs10.sh 20130719 07:40:46 all (8/213): tmpfs9.sh 20130719 07:51:50 all (9/213): msdos.sh 20130719 08:01:52 all (10/213): fts.sh panic: vm_radix_remove: impossible to locate the key cpuid = 1 KDB: enter: panic [ thread pid 32189 tid 101386 ] Stopped at kdb_enter+0x3e: movq $0,kdb_why db> call doadump Dumping 1910 out of 31644 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% Dump complete = 0 db> reset cpu_reset: Restarting BSP cpu_reset_proxy: Stopped CPU 1 (kgdb) bt #0 doadump (textdump=0x812c7a38) at pcpu.h:236 #1 0xffffffff8033b3d5 in db_fncall (dummy1=, dummy2=, dummy3=, dummy4=) at ../../../ddb/db_command.c:578 #2 0xffffffff8033b0bd in db_command (cmd_table=) at ../../../ddb/db_command.c:449 #3 0xffffffff8033ae34 in db_command_loop () at ../../../ddb/db_command.c:502 #4 0xffffffff8033d7e0 in db_trap (type=, code=0x0) at ../../../ddb/db_main.c:231 #5 0xffffffff808f29c3 in kdb_trap (type=0x3, code=0x0, tf=) at ../../../kern/subr_kdb.c:654 #6 0xffffffff80cac67b in trap (frame=0xffffff881f700fa0) at ../../../amd64/amd64/trap.c:579 #7 0xffffffff80c95672 in calltrap () at ../../../amd64/amd64/exception.S:232 #8 0xffffffff808f219e in kdb_enter (why=0xffffffff80f74545 "panic", msg=) at cpufunc.h:63 #9 0xffffffff808bc176 in vpanic (fmt=, ap=) at ../../../kern/kern_shutdown.c:747 #10 0xffffffff808bc1e3 in panic (fmt=) at ../../../kern/kern_shutdown.c:683 #11 0xffffffff80b485b7 in vm_radix_remove (rtree=, index=) at ../../../vm/vm_radix.c:683 #12 0xffffffff80b3ed5e in vm_page_alloc (object=0xfffffe067e57f4b0, pindex=0x1, req=0x10222) at ../../../vm/vm_page.c:1210 #13 0xffffffff80b40b70 in vm_page_grab (object=0xfffffe067e57f4b0, pindex=0x1, allocflags=0x112a2) at ../../../vm/vm_page.c:2617 #14 0xffffffff80948251 in allocbuf (bp=0xffffff8789550540, size=0x2000) at ../../../kern/vfs_bio.c:3497 #15 0xffffffff80adedfd in ffs_realloccg (ip=0xfffffe0629b70dc8, lbprev=0x0, bprev=0x308e8, bpref=Cannot access memory at address 0x0 ) at ../../../ufs/ffs/ffs_alloc.c:335 #16 0xffffffff80ae5e3f in ffs_balloc_ufs2 (vp=0xfffffe04cc34a4e0, startoffset=, size=, cred=0xfffffe02d1065e00, flags=, bpp=0xffffff881f7015d0) at ../../../ufs/ffs/ffs_balloc.c:755 #17 0xffffffff80b1686b in ufs_direnter (dvp=0xfffffe04cc34a4e0, tvp=0xfffffe0123014750, dirp=0xffffff881f701638, cnp=, newdirbp=0x0, isrename=0x0) at ../../../ufs/ufs/ufs_lookup.c:912 #18 0xffffffff80b1d26f in ufs_link (ap=) at ../../../ufs/ufs/ufs_vnops.c:1029 #19 0xffffffff80d56f0d in VOP_LINK_APV (vop=, a=) at vnode_if.c:1473 #20 0xffffffff8096aba2 in kern_linkat (td=, fd1=, fd2=0xffffff9c, path1=, path2=0x7fffffffd4d0
, segflg=, follow=0x40) at vnode_if.h:601 #21 0xffffffff8096a858 in sys_link (td=0xffffff881f700f60, uap=) at ../../../kern/vfs_syscalls.c:1538 #22 0xffffffff80cad2e2 in amd64_syscall (td=0xfffffe0282100000, traced=0x0) at subr_syscall.c:134 #23 0xffffffff80c9595b in Xfast_syscall () at ../../../amd64/amd64/exception.S:391 #24 0x0000000800a98e7a in ?? () Previous frame inner to this frame (corrupt stack?) Current language: auto; currently minimal (kgdb) f 11 #11 0xffffffff80b485b7 in vm_radix_remove (rtree=, index=) at ../../../vm/vm_radix.c:683 683 panic("vm_radix_remove: impossible to locate the key"); (kgdb) l 678 return; 679 } 680 parent = NULL; 681 for (;;) { 682 if (rnode == NULL) 683 panic("vm_radix_remove: impossible to locate the key"); 684 slot = vm_radix_slot(index, rnode->rn_clev); 685 if (vm_radix_isleaf(rnode->rn_child[slot])) { 686 m = vm_radix_topage(rnode->rn_child[slot]); 687 if (m->pindex != index) (kgdb) info loc m = rnode = (kgdb) $ svn diff -x -p /usr/src/sys Index: /usr/src/sys/amd64/amd64/pmap.c =================================================================== --- /usr/src/sys/amd64/amd64/pmap.c (revision 253382) +++ /usr/src/sys/amd64/amd64/pmap.c (working copy) @@ -283,7 +283,7 @@ static boolean_t pmap_enter_pde(pmap_t pmap, vm_of static vm_page_t pmap_enter_quick_locked(pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot, vm_page_t mpte, struct rwlock **lockp); static void pmap_fill_ptp(pt_entry_t *firstpte, pt_entry_t newpte); -static void pmap_insert_pt_page(pmap_t pmap, vm_page_t mpte); +static int pmap_insert_pt_page(pmap_t pmap, vm_page_t mpte); static boolean_t pmap_is_modified_pvh(struct md_page *pvh); static boolean_t pmap_is_referenced_pvh(struct md_page *pvh); static void pmap_kenter_attr(vm_offset_t va, vm_paddr_t pa, int mode); @@ -1525,12 +1525,12 @@ pmap_add_delayed_free_list(vm_page_t m, vm_page_t * for mapping a distinct range of virtual addresses. The pmap's collection is * ordered by this virtual address range. */ -static __inline void +static __inline int pmap_insert_pt_page(pmap_t pmap, vm_page_t mpte) { PMAP_LOCK_ASSERT(pmap, MA_OWNED); - vm_radix_insert(&pmap->pm_root, mpte); + return (vm_radix_insert(&pmap->pm_root, mpte)); } /* @@ -3393,7 +3393,13 @@ setpte: ("pmap_promote_pde: page table page is out of range")); KASSERT(mpte->pindex == pmap_pde_pindex(va), ("pmap_promote_pde: page table page's pindex is wrong")); - pmap_insert_pt_page(pmap, mpte); + if (pmap_insert_pt_page(pmap, mpte)) { + atomic_add_long(&pmap_pde_p_failures, 1); + CTR2(KTR_PMAP, + "pmap_promote_pde: failure for va %#lx in pmap %p", va, + pmap); + return; + } /* * Promote the pv entries. Index: /usr/src/sys/dev/drm2/i915/i915_gem.c =================================================================== --- /usr/src/sys/dev/drm2/i915/i915_gem.c (revision 253382) +++ /usr/src/sys/dev/drm2/i915/i915_gem.c (working copy) @@ -64,6 +64,9 @@ __FBSDID("$FreeBSD$"); #include #include +#include +#include + static void i915_gem_object_flush_cpu_write_domain( struct drm_i915_gem_object *obj); static uint32_t i915_gem_get_gtt_size(struct drm_device *dev, uint32_t size, @@ -1440,8 +1443,14 @@ unlocked_vmobj: vm_page_sleep(m, "915pbs"); goto retry; } + if (vm_page_insert(m, vm_obj, OFF_TO_IDX(offset))) { + DRM_UNLOCK(dev); + VM_OBJECT_WUNLOCK(vm_obj); + VM_WAIT; + VM_OBJECT_WLOCK(vm_obj); + goto retry; + } m->valid = VM_PAGE_BITS_ALL; - vm_page_insert(m, vm_obj, OFF_TO_IDX(offset)); have_page: *mres = m; vm_page_busy(m); Index: /usr/src/sys/dev/drm2/ttm/ttm_bo_vm.c =================================================================== --- /usr/src/sys/dev/drm2/ttm/ttm_bo_vm.c (revision 253382) +++ /usr/src/sys/dev/drm2/ttm/ttm_bo_vm.c (working copy) @@ -47,6 +47,7 @@ __FBSDID("$FreeBSD$"); #include #include +#include #define TTM_BO_VM_NUM_PREFAULT 16 @@ -218,9 +219,16 @@ reserve: ttm_bo_unreserve(bo); goto retry; } + if (vm_page_insert(m, vm_obj, OFF_TO_IDX(offset))) { + VM_OBJECT_WUNLOCK(vm_obj); + VM_WAIT; + VM_OBJECT_WLOCK(vm_obj); + ttm_mem_io_unlock(man); + ttm_bo_unreserve(bo); + goto retry; + } m->valid = VM_PAGE_BITS_ALL; *mres = m; - vm_page_insert(m, vm_obj, OFF_TO_IDX(offset)); vm_page_busy(m); if (oldm != NULL) { Index: /usr/src/sys/i386/i386/pmap.c =================================================================== --- /usr/src/sys/i386/i386/pmap.c (revision 253382) +++ /usr/src/sys/i386/i386/pmap.c (working copy) @@ -304,7 +304,7 @@ static boolean_t pmap_enter_pde(pmap_t pmap, vm_of static vm_page_t pmap_enter_quick_locked(pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot, vm_page_t mpte); static void pmap_flush_page(vm_page_t m); -static void pmap_insert_pt_page(pmap_t pmap, vm_page_t mpte); +static int pmap_insert_pt_page(pmap_t pmap, vm_page_t mpte); static void pmap_fill_ptp(pt_entry_t *firstpte, pt_entry_t newpte); static boolean_t pmap_is_modified_pvh(struct md_page *pvh); static boolean_t pmap_is_referenced_pvh(struct md_page *pvh); @@ -1604,12 +1604,12 @@ pmap_add_delayed_free_list(vm_page_t m, vm_page_t * for mapping a distinct range of virtual addresses. The pmap's collection is * ordered by this virtual address range. */ -static __inline void +static __inline int pmap_insert_pt_page(pmap_t pmap, vm_page_t mpte) { PMAP_LOCK_ASSERT(pmap, MA_OWNED); - vm_radix_insert(&pmap->pm_root, mpte); + return (vm_radix_insert(&pmap->pm_root, mpte)); } /* @@ -3365,7 +3365,13 @@ setpte: ("pmap_promote_pde: page table page is out of range")); KASSERT(mpte->pindex == va >> PDRSHIFT, ("pmap_promote_pde: page table page's pindex is wrong")); - pmap_insert_pt_page(pmap, mpte); + if (pmap_insert_pt_page(pmap, mpte)) { + pmap_pde_p_failures++; + CTR2(KTR_PMAP, + "pmap_promote_pde: failure for va %#x in pmap %p", va, + pmap); + return; + } /* * Promote the pv entries. Index: /usr/src/sys/kern/kern_mutex.c =================================================================== --- /usr/src/sys/kern/kern_mutex.c (revision 253382) +++ /usr/src/sys/kern/kern_mutex.c (working copy) @@ -218,13 +218,14 @@ __mtx_lock_flags(volatile uintptr_t *c, int opts, KASSERT(LOCK_CLASS(&m->lock_object) == &lock_class_mtx_sleep, ("mtx_lock() of spin mutex %s @ %s:%d", m->lock_object.lo_name, file, line)); - WITNESS_CHECKORDER(&m->lock_object, opts | LOP_NEWORDER | LOP_EXCLUSIVE, - file, line, NULL); + WITNESS_CHECKORDER(&m->lock_object, (opts & ~MTX_RECURSE) | + LOP_NEWORDER | LOP_EXCLUSIVE, file, line, NULL); __mtx_lock(m, curthread, opts, file, line); LOCK_LOG_LOCK("LOCK", &m->lock_object, opts, m->mtx_recurse, file, line); - WITNESS_LOCK(&m->lock_object, opts | LOP_EXCLUSIVE, file, line); + WITNESS_LOCK(&m->lock_object, (opts & ~MTX_RECURSE) | LOP_EXCLUSIVE, + file, line); curthread->td_locks++; } @@ -271,9 +272,11 @@ __mtx_lock_spin_flags(volatile uintptr_t *c, int o ("mtx_lock_spin() of sleep mutex %s @ %s:%d", m->lock_object.lo_name, file, line)); if (mtx_owned(m)) - KASSERT((m->lock_object.lo_flags & LO_RECURSABLE) != 0, + KASSERT((m->lock_object.lo_flags & LO_RECURSABLE) != 0 || + (opts & MTX_RECURSE) != 0, ("mtx_lock_spin: recursed on non-recursive mutex %s @ %s:%d\n", m->lock_object.lo_name, file, line)); + opts &= ~MTX_RECURSE; WITNESS_CHECKORDER(&m->lock_object, opts | LOP_NEWORDER | LOP_EXCLUSIVE, file, line, NULL); __mtx_lock_spin(m, curthread, opts, file, line); @@ -335,12 +338,14 @@ _mtx_trylock_flags_(volatile uintptr_t *c, int opt ("mtx_trylock() of spin mutex %s @ %s:%d", m->lock_object.lo_name, file, line)); - if (mtx_owned(m) && (m->lock_object.lo_flags & LO_RECURSABLE) != 0) { + if (mtx_owned(m) && ((m->lock_object.lo_flags & LO_RECURSABLE) != 0 || + (opts & MTX_RECURSE) != 0)) { m->mtx_recurse++; atomic_set_ptr(&m->mtx_lock, MTX_RECURSED); rval = 1; } else rval = _mtx_obtain_lock(m, (uintptr_t)curthread); + opts &= ~MTX_RECURSE; LOCK_LOG_TRY("LOCK", &m->lock_object, opts, rval, file, line); if (rval) { @@ -391,9 +396,11 @@ __mtx_lock_sleep(volatile uintptr_t *c, uintptr_t m = mtxlock2mtx(c); if (mtx_owned(m)) { - KASSERT((m->lock_object.lo_flags & LO_RECURSABLE) != 0, + KASSERT((m->lock_object.lo_flags & LO_RECURSABLE) != 0 || + (opts & MTX_RECURSE) != 0, ("_mtx_lock_sleep: recursed on non-recursive mutex %s @ %s:%d\n", m->lock_object.lo_name, file, line)); + opts &= ~MTX_RECURSE; m->mtx_recurse++; atomic_set_ptr(&m->mtx_lock, MTX_RECURSED); if (LOCK_LOG_TEST(&m->lock_object, opts)) @@ -400,6 +407,7 @@ __mtx_lock_sleep(volatile uintptr_t *c, uintptr_t CTR1(KTR_LOCK, "_mtx_lock_sleep: %p recursing", m); return; } + opts &= ~MTX_RECURSE; #ifdef HWPMC_HOOKS PMC_SOFT_CALL( , , lock, failed); Index: /usr/src/sys/kern/subr_uio.c =================================================================== --- /usr/src/sys/kern/subr_uio.c (revision 253382) +++ /usr/src/sys/kern/subr_uio.c (working copy) @@ -56,6 +56,7 @@ __FBSDID("$FreeBSD$"); #include #include #include +#include #include #ifdef SOCKET_SEND_COW #include @@ -122,7 +123,12 @@ retry: if (uobject->backing_object != NULL) pmap_remove(map->pmap, uaddr, uaddr + PAGE_SIZE); } - vm_page_insert(kern_pg, uobject, upindex); + if (vm_page_insert(kern_pg, uobject, upindex)) { + VM_OBJECT_WUNLOCK(uobject); + VM_WAIT; + VM_OBJECT_WLOCK(uobject); + goto retry; + } vm_page_dirty(kern_pg); VM_OBJECT_WUNLOCK(uobject); vm_map_lookup_done(map, entry); Index: /usr/src/sys/kern/subr_vmem.c =================================================================== --- /usr/src/sys/kern/subr_vmem.c (revision 253382) +++ /usr/src/sys/kern/subr_vmem.c (working copy) @@ -54,6 +54,7 @@ __FBSDID("$FreeBSD$"); #include #include #include +#include #include #include @@ -164,6 +165,9 @@ struct vmem_btag { #define BT_END(bt) ((bt)->bt_start + (bt)->bt_size - 1) #if defined(DIAGNOSTIC) +static int do_vmem_check = 1; +SYSCTL_INT(_debug, OID_AUTO, do_vmem_check, CTLFLAG_RW, + &do_vmem_check, 0, "Enable vmem check"); static void vmem_check(vmem_t *); #endif @@ -618,9 +622,11 @@ vmem_periodic(void *unused, int pending) LIST_FOREACH(vm, &vmem_list, vm_alllist) { #ifdef DIAGNOSTIC /* Convenient time to verify vmem state. */ - VMEM_LOCK(vm); - vmem_check(vm); - VMEM_UNLOCK(vm); + if (do_vmem_check == 1) { + VMEM_LOCK(vm); + vmem_check(vm); + VMEM_UNLOCK(vm); + } #endif desired = 1 << flsl(vm->vm_nbusytag); desired = MIN(MAX(desired, VMEM_HASHSIZE_MIN), Index: /usr/src/sys/sys/ktr.h =================================================================== --- /usr/src/sys/sys/ktr.h (revision 253382) +++ /usr/src/sys/sys/ktr.h (working copy) @@ -75,7 +75,8 @@ #define KTR_INET6 0x10000000 /* IPv6 stack */ #define KTR_SCHED 0x20000000 /* Machine parsed sched info. */ #define KTR_BUF 0x40000000 /* Buffer cache */ -#define KTR_ALL 0x7fffffff +#define KTR_DEBUG 0x80000000 +#define KTR_ALL 0xffffffff /* Trace classes to compile in */ #ifdef KTR Index: /usr/src/sys/vm/device_pager.c =================================================================== --- /usr/src/sys/vm/device_pager.c (revision 253382) +++ /usr/src/sys/vm/device_pager.c (working copy) @@ -348,11 +348,12 @@ old_dev_pager_fault(vm_object_t object, vm_ooffset */ page = vm_page_getfake(paddr, memattr); VM_OBJECT_WLOCK(object); + if (vm_page_replace(page, object, (*mres)->pindex) != *mres) + panic("old_dev_pager_fault: invalid page replacement"); vm_page_lock(*mres); vm_page_free(*mres); vm_page_unlock(*mres); *mres = page; - vm_page_insert(page, object, pidx); } page->valid = VM_PAGE_BITS_ALL; return (VM_PAGER_OK); Index: /usr/src/sys/vm/sg_pager.c =================================================================== --- /usr/src/sys/vm/sg_pager.c (revision 253382) +++ /usr/src/sys/vm/sg_pager.c (working copy) @@ -186,11 +186,13 @@ sg_pager_getpages(vm_object_t object, vm_page_t *m /* Free the original pages and insert this fake page into the object. */ for (i = 0; i < count; i++) { + if (i == reqpage && + vm_page_replace(page, object, offset) != m[i]) + panic("sg_pager_getpages: invalid place replacement"); vm_page_lock(m[i]); vm_page_free(m[i]); vm_page_unlock(m[i]); } - vm_page_insert(page, object, offset); m[reqpage] = page; page->valid = VM_PAGE_BITS_ALL; Index: /usr/src/sys/vm/vm_fault.c =================================================================== --- /usr/src/sys/vm/vm_fault.c (revision 253382) +++ /usr/src/sys/vm/vm_fault.c (working copy) @@ -754,9 +754,11 @@ vnode_locked: * process'es object. The page is * automatically made dirty. */ - vm_page_lock(fs.m); - vm_page_rename(fs.m, fs.first_object, fs.first_pindex); - vm_page_unlock(fs.m); + if (vm_page_rename(fs.m, fs.first_object, + fs.first_pindex)) { + unlock_and_deallocate(&fs); + goto RetryFault; + } vm_page_busy(fs.m); fs.first_m = fs.m; fs.m = NULL; Index: /usr/src/sys/vm/vm_object.c =================================================================== --- /usr/src/sys/vm/vm_object.c (revision 253382) +++ /usr/src/sys/vm/vm_object.c (working copy) @@ -1351,6 +1351,16 @@ retry: VM_OBJECT_WLOCK(new_object); goto retry; } + + /* vm_page_rename() will handle dirty and cache. */ + if (vm_page_rename(m, new_object, idx)) { + VM_OBJECT_WUNLOCK(new_object); + VM_OBJECT_WUNLOCK(orig_object); + VM_WAIT; + VM_OBJECT_WLOCK(orig_object); + VM_OBJECT_WLOCK(new_object); + goto retry; + } #if VM_NRESERVLEVEL > 0 /* * If some of the reservation's allocated pages remain with @@ -1366,10 +1376,6 @@ retry: */ vm_reserv_rename(m, new_object, orig_object, offidxstart); #endif - vm_page_lock(m); - vm_page_rename(m, new_object, idx); - vm_page_unlock(m); - /* page automatically made dirty by rename and cache handled */ if (orig_object->type == OBJT_SWAP) vm_page_busy(m); } @@ -1527,21 +1533,14 @@ vm_object_backing_scan(vm_object_t object, int op) ("vm_object_backing_scan: object mismatch") ); - /* - * Destroy any associated swap - */ - if (backing_object->type == OBJT_SWAP) { - swap_pager_freespace( - backing_object, - p->pindex, - 1 - ); - } - if ( p->pindex < backing_offset_index || new_pindex >= object->size ) { + if (backing_object->type == OBJT_SWAP) + swap_pager_freespace(backing_object, + p->pindex, 1); + /* * Page is out of the parent object's range, we * can simply destroy it. @@ -1563,6 +1562,10 @@ vm_object_backing_scan(vm_object_t object, int op) (op & OBSC_COLLAPSE_NOWAIT) != 0 && (pp != NULL && pp->valid == 0) ) { + if (backing_object->type == OBJT_SWAP) + swap_pager_freespace(backing_object, + p->pindex, 1); + /* * The page in the parent is not (yet) valid. * We don't know anything about the state of @@ -1581,6 +1584,10 @@ vm_object_backing_scan(vm_object_t object, int op) pp != NULL || vm_pager_has_page(object, new_pindex, NULL, NULL) ) { + if (backing_object->type == OBJT_SWAP) + swap_pager_freespace(backing_object, + p->pindex, 1); + /* * page already exists in parent OR swap exists * for this location in the parent. Destroy @@ -1600,6 +1607,31 @@ vm_object_backing_scan(vm_object_t object, int op) continue; } + /* + * Page does not exist in parent, rename the + * page from the backing object to the main object. + * + * If the page was mapped to a process, it can remain + * mapped through the rename. + * vm_page_rename() will handle dirty and cache. + */ + if (vm_page_rename(p, object, new_pindex)) { + if (op & OBSC_COLLAPSE_NOWAIT) { + p = next; + continue; + } + VM_OBJECT_WLOCK(backing_object); + VM_OBJECT_WUNLOCK(object); + VM_WAIT; + VM_OBJECT_WLOCK(object); + VM_OBJECT_WLOCK(backing_object); + p = TAILQ_FIRST(&backing_object->memq); + continue; + } + if (backing_object->type == OBJT_SWAP) + swap_pager_freespace(backing_object, p->pindex, + 1); + #if VM_NRESERVLEVEL > 0 /* * Rename the reservation. @@ -1607,18 +1639,6 @@ vm_object_backing_scan(vm_object_t object, int op) vm_reserv_rename(p, object, backing_object, backing_offset_index); #endif - - /* - * Page does not exist in parent, rename the - * page from the backing object to the main object. - * - * If the page was mapped to a process, it can remain - * mapped through the rename. - */ - vm_page_lock(p); - vm_page_rename(p, object, new_pindex); - vm_page_unlock(p); - /* page automatically made dirty by rename */ } p = next; } Index: /usr/src/sys/vm/vm_object.h =================================================================== --- /usr/src/sys/vm/vm_object.h (revision 253382) +++ /usr/src/sys/vm/vm_object.h (working copy) @@ -102,7 +102,7 @@ struct vm_object { TAILQ_ENTRY(vm_object) object_list; /* list of all objects */ LIST_HEAD(, vm_object) shadow_head; /* objects that this is a shadow for */ LIST_ENTRY(vm_object) shadow_list; /* chain of shadow objects */ - TAILQ_HEAD(, vm_page) memq; /* list of resident pages */ + TAILQ_HEAD(respgs, vm_page) memq; /* list of resident pages */ struct vm_radix rtree; /* root of the resident page radix trie*/ vm_pindex_t size; /* Object size */ int generation; /* generation ID */ Index: /usr/src/sys/vm/vm_page.c =================================================================== --- /usr/src/sys/vm/vm_page.c (revision 253382) +++ /usr/src/sys/vm/vm_page.c (working copy) @@ -91,6 +91,7 @@ __FBSDID("$FreeBSD$"); #include #include #include +#include #include #include #include @@ -159,11 +160,15 @@ SYSCTL_INT(_vm, OID_AUTO, tryrelock_restart, CTLFL static uma_zone_t fakepg_zone; static struct vnode *vm_page_alloc_init(vm_page_t m); +static void vm_page_alloc_turn_free(vm_page_t m, int req); +static void vm_page_cache_turn_free(vm_page_t m); static void vm_page_clear_dirty_mask(vm_page_t m, vm_page_bits_t pagebits); static void vm_page_enqueue(int queue, vm_page_t m); static void vm_page_init_fakepg(void *dummy); -static void vm_page_insert_after(vm_page_t m, vm_object_t object, +static int vm_page_insert_after(vm_page_t m, vm_object_t object, vm_pindex_t pindex, vm_page_t mpred); +static void vm_page_insert_radixdone(vm_page_t m, vm_object_t object, + vm_page_t mpred); SYSINIT(vm_page, SI_SUB_VM, SI_ORDER_SECOND, vm_page_init_fakepg, NULL); @@ -805,7 +810,7 @@ vm_page_dirty_KBI(vm_page_t m) * * The object must be locked. */ -void +int vm_page_insert(vm_page_t m, vm_object_t object, vm_pindex_t pindex) { vm_page_t mpred; @@ -812,7 +817,7 @@ vm_page_insert(vm_page_t m, vm_object_t object, vm VM_OBJECT_ASSERT_WLOCKED(object); mpred = vm_radix_lookup_le(&object->rtree, pindex); - vm_page_insert_after(m, object, pindex, mpred); + return (vm_page_insert_after(m, object, pindex, mpred)); } /* @@ -825,10 +830,12 @@ vm_page_insert(vm_page_t m, vm_object_t object, vm * * The object must be locked. */ -static void +static int vm_page_insert_after(vm_page_t m, vm_object_t object, vm_pindex_t pindex, vm_page_t mpred) { + vm_pindex_t sidx; + vm_object_t sobj; vm_page_t msucc; VM_OBJECT_ASSERT_WLOCKED(object); @@ -850,6 +857,8 @@ vm_page_insert_after(vm_page_t m, vm_object_t obje /* * Record the object/offset pair in this page */ + sobj = m->object; + sidx = m->pindex; m->object = object; m->pindex = pindex; @@ -856,11 +865,45 @@ vm_page_insert_after(vm_page_t m, vm_object_t obje /* * Now link into the object's ordered list of backed pages. */ + if (vm_radix_insert(&object->rtree, m)) { + m->object = sobj; + m->pindex = sidx; + return (1); + } + vm_page_insert_radixdone(m, object, mpred); + return (0); +} + +/* + * vm_page_insert_radixdone: + * + * Complete page "m" insertion into the specified object after the + * radix trie hooking. + * + * The page "mpred" must precede the offset "m->pindex" within the + * specified object. + * + * The object must be locked. + */ +static void +vm_page_insert_radixdone(vm_page_t m, vm_object_t object, vm_page_t mpred) +{ + + VM_OBJECT_ASSERT_WLOCKED(object); + KASSERT(object != NULL && m->object == object, + ("vm_page_insert_radixdone: page %p has inconsistent object", m)); + if (mpred != NULL) { + KASSERT(mpred->object == object || + (mpred->flags & PG_SLAB) != 0, + ("vm_page_insert_after: object doesn't contain mpred")); + KASSERT(mpred->pindex < m->pindex, + ("vm_page_insert_after: mpred doesn't precede pindex")); + } + if (mpred != NULL) TAILQ_INSERT_AFTER(&object->memq, mpred, m, listq); else TAILQ_INSERT_HEAD(&object->memq, m, listq); - vm_radix_insert(&object->rtree, m); /* * Show that the object has one more resident page. @@ -997,6 +1040,54 @@ vm_page_prev(vm_page_t m) } /* + * Uses the page mnew as a replacement for an existing page at index + * pindex which must be already present in the object. + */ +vm_page_t +vm_page_replace(vm_page_t mnew, vm_object_t object, vm_pindex_t pindex) +{ + vm_page_t mold, mpred; + + VM_OBJECT_ASSERT_WLOCKED(object); + + /* + * This function mostly follows vm_page_insert() and + * vm_page_remove() without the radix, object count and vnode + * dance. Double check such functions for more comments. + */ + mpred = vm_radix_lookup(&object->rtree, pindex); + KASSERT(mpred != NULL, + ("vm_page_replace: replacing page not present with pindex")); + mpred = TAILQ_PREV(mpred, respgs, listq); + if (mpred != NULL) + KASSERT(mpred->pindex < pindex, + ("vm_page_insert_after: mpred doesn't precede pindex")); + + mnew->object = object; + mnew->pindex = pindex; + mold = vm_radix_replace(&object->rtree, mnew, pindex); + + /* Detach the old page from the resident tailq. */ + TAILQ_REMOVE(&object->memq, mold, listq); + vm_page_lock(mold); + if (mold->oflags & VPO_BUSY) { + mold->oflags &= ~VPO_BUSY; + vm_page_flash(mold); + } + mold->object = NULL; + vm_page_unlock(mold); + + /* Insert the new page in the resident tailq. */ + if (mpred != NULL) + TAILQ_INSERT_AFTER(&object->memq, mpred, mnew, listq); + else + TAILQ_INSERT_HEAD(&object->memq, mnew, listq); + if (pmap_page_is_write_mapped(mnew)) + vm_object_set_writeable_dirty(object); + return (mold); +} + +/* * vm_page_rename: * * Move the given memory entry from its @@ -1014,15 +1105,47 @@ vm_page_prev(vm_page_t m) * or vm_page_dirty() will panic. Dirty pages are not allowed * on the cache. * - * The objects must be locked. The page must be locked if it is managed. + * The objects must be locked. */ -void +int vm_page_rename(vm_page_t m, vm_object_t new_object, vm_pindex_t new_pindex) { + vm_page_t mpred; + vm_pindex_t opidx; + VM_OBJECT_ASSERT_WLOCKED(new_object); + + mpred = vm_radix_lookup_le(&new_object->rtree, new_pindex); + KASSERT(mpred == NULL || mpred->pindex != new_pindex, + ("vm_page_rename: pindex already renamed")); + + /* + * Create a custom version of vm_page_insert() which does not depend + * by m_prev and can cheat on the implementation aspects of the + * function. + */ + opidx = m->pindex; + m->pindex = new_pindex; + if (vm_radix_insert(&new_object->rtree, m)) { + m->pindex = opidx; + return (1); + } + + /* + * The operation cannot fail anymore. The removal must happen before + * the listq iterator is tainted. + */ + m->pindex = opidx; + vm_page_lock(m); vm_page_remove(m); - vm_page_insert(m, new_object, new_pindex); + + /* Return back to the new pindex to complete vm_page_insert(). */ + m->pindex = new_pindex; + m->object = new_object; + vm_page_unlock(m); + vm_page_insert_radixdone(m, new_object, mpred); vm_page_dirty(m); + return (0); } /* @@ -1048,14 +1171,8 @@ vm_page_cache_free(vm_object_t object, vm_pindex_t if (end != 0 && m->pindex >= end) break; vm_radix_remove(&object->cache, m->pindex); - m->object = NULL; - m->valid = 0; - /* Clear PG_CACHED and set PG_FREE. */ - m->flags ^= PG_CACHED | PG_FREE; - KASSERT((m->flags & (PG_CACHED | PG_FREE)) == PG_FREE, - ("vm_page_cache_free: page %p has inconsistent flags", m)); - cnt.v_cache_count--; - cnt.v_free_count++; + vm_page_cache_turn_free(m); + CTR3(KTR_DEBUG, "%s: page %p flags %x", __func__, m, m->flags); } empty = vm_radix_is_empty(&object->cache); mtx_unlock(&vm_page_queue_free_mtx); @@ -1135,7 +1252,11 @@ vm_page_cache_transfer(vm_object_t orig_object, vm /* Update the page's object and offset. */ m->object = new_object; m->pindex -= offidxstart; - vm_radix_insert(&new_object->cache, m); + if (vm_radix_insert(&new_object->cache, m)) { + vm_page_cache_turn_free(m); + CTR3(KTR_DEBUG, "%s: page %p flags %x", __func__, m, + m->flags); + } } mtx_unlock(&vm_page_queue_free_mtx); } @@ -1223,7 +1344,13 @@ vm_page_alloc(vm_object_t object, vm_pindex_t pind KASSERT(mpred == NULL || mpred->pindex != pindex, ("vm_page_alloc: pindex already allocated")); } - mtx_lock(&vm_page_queue_free_mtx); + + /* + * The page allocation request can came from consumers which already + * hold the free page queue mutex, like vm_page_insert() in + * vm_page_cache(). + */ + mtx_lock_flags(&vm_page_queue_free_mtx, MTX_RECURSE); if (cnt.v_free_count + cnt.v_cache_count > cnt.v_free_reserved || (req_class == VM_ALLOC_SYSTEM && cnt.v_free_count + cnt.v_cache_count > cnt.v_interrupt_free_min) || @@ -1239,6 +1366,7 @@ vm_page_alloc(vm_object_t object, vm_pindex_t pind mtx_unlock(&vm_page_queue_free_mtx); return (NULL); } + CTR2(KTR_DEBUG, "%s: retrieve 1 page %p", __func__, m); if (vm_phys_unfree_page(m)) vm_phys_set_pool(VM_FREEPOOL_DEFAULT, m, 0); #if VM_NRESERVLEVEL > 0 @@ -1260,11 +1388,14 @@ vm_page_alloc(vm_object_t object, vm_pindex_t pind #endif m = vm_phys_alloc_pages(object != NULL ? VM_FREEPOOL_DEFAULT : VM_FREEPOOL_DIRECT, 0); + CTR2(KTR_DEBUG, "%s: retrieve 2 page %p", __func__, m); #if VM_NRESERVLEVEL > 0 if (m == NULL && vm_reserv_reclaim_inactive()) { m = vm_phys_alloc_pages(object != NULL ? VM_FREEPOOL_DEFAULT : VM_FREEPOOL_DIRECT, 0); + CTR2(KTR_DEBUG, "%s: retrieve 3 page %p", + __func__, m); } #endif } @@ -1344,11 +1475,20 @@ vm_page_alloc(vm_object_t object, vm_pindex_t pind m->act_count = 0; if (object != NULL) { + if (vm_page_insert_after(m, object, pindex, mpred)) { + vm_page_alloc_turn_free(m, req); + + /* See the comment below about hold count. */ + if (vp != NULL) + vdrop(vp); + pagedaemon_wakeup(); + return (NULL); + } + /* Ignore device objects; the pager sets "memattr" for them. */ if (object->memattr != VM_MEMATTR_DEFAULT && (object->flags & OBJ_FICTITIOUS) == 0) pmap_page_set_memattr(m, object->memattr); - vm_page_insert_after(m, object, pindex, mpred); } else m->pindex = pindex; @@ -1414,7 +1554,7 @@ vm_page_alloc_contig(vm_object_t object, vm_pindex vm_paddr_t boundary, vm_memattr_t memattr) { struct vnode *drop; - vm_page_t deferred_vdrop_list, m, m_ret; + vm_page_t deferred_vdrop_list, m, m_tmp, m_ret; u_int flags, oflags; int req_class; @@ -1508,12 +1648,32 @@ retry: m->wire_count = 1; /* Unmanaged pages don't use "act_count". */ m->oflags = oflags; + if (object != NULL) { + if (vm_page_insert(m, object, pindex)) { + for (m_tmp = m_ret; m_tmp < m; m_tmp++) { + vm_page_lock(m_tmp); + vm_page_remove(m_tmp); + vm_page_unlock(m_tmp); + if (pmap_page_get_memattr(m) != + VM_MEMATTR_DEFAULT) + pmap_page_set_memattr(m, + VM_MEMATTR_DEFAULT); + } + for (m = m_ret; m < &m_ret[npages]; m++) + vm_page_alloc_turn_free(m, req); + while (deferred_vdrop_list != NULL) { + vdrop((struct vnode *)deferred_vdrop_list->pageq.tqe_prev); + deferred_vdrop_list = + deferred_vdrop_list->pageq.tqe_next; + } + if (vm_paging_needed()) + pagedaemon_wakeup(); + return (NULL); + } + } else + m->pindex = pindex; if (memattr != VM_MEMATTR_DEFAULT) pmap_page_set_memattr(m, memattr); - if (object != NULL) - vm_page_insert(m, object, pindex); - else - m->pindex = pindex; pindex++; } while (deferred_vdrop_list != NULL) { @@ -1883,6 +2043,66 @@ vm_page_free_wakeup(void) } /* + * Turn a cached page into a free page, by changing its attributes. + * Keep the statistics up-to-date. + * + * The free page queue must be locked. + */ +static void +vm_page_cache_turn_free(vm_page_t m) +{ + + mtx_assert(&vm_page_queue_free_mtx, MA_OWNED); + + m->object = NULL; + m->valid = 0; + /* Clear PG_CACHED and set PG_FREE. */ + m->flags ^= PG_CACHED | PG_FREE; + KASSERT((m->flags & (PG_CACHED | PG_FREE)) == PG_FREE, + ("vm_page_cache_free: page %p has inconsistent flags", m)); + cnt.v_cache_count--; + cnt.v_free_count++; +} + +/* + * Turn a partially allocated page into a free page. + * Keep the statistics up-to-date. + * + * The free page queue must be locked. + */ +static void +vm_page_alloc_turn_free(vm_page_t m, int req) +{ + + KASSERT(m->object == NULL, + ("vm_page_alloc_turn_free: invalid object %p for page %p", + m->object, m)); + + if (req & VM_ALLOC_WIRED) { + m->wire_count = 0; + atomic_subtract_int(&cnt.v_wire_count, 1); + } + m->object = NULL; + m->flags &= ~PG_ZERO; + PCPU_INC(cnt.v_tfree); + m->valid = 0; + m->oflags = 0; + mtx_lock_flags(&vm_page_queue_free_mtx, MTX_RECURSE); + vm_page_undirty(m); + m->flags |= PG_FREE; + cnt.v_free_count++; +#if VM_NRESERVLEVEL > 0 + if (!vm_reserv_free_page(m)) +#else + if (TRUE) +#endif + vm_phys_free_pages(m, 0); + vm_page_zero_idle_wakeup(); + vm_page_free_wakeup(); + mtx_unlock(&vm_page_queue_free_mtx); +} + +/* * vm_page_free_toq: * * Returns the given page to the free list, @@ -2184,7 +2404,6 @@ vm_page_cache(vm_page_t m) } KASSERT((m->flags & PG_CACHED) == 0, ("vm_page_cache: page %p is already cached", m)); - PCPU_INC(cnt.v_tcached); /* * Remove the page from the paging queues. @@ -2211,10 +2430,27 @@ vm_page_cache(vm_page_t m) */ m->flags &= ~PG_ZERO; mtx_lock(&vm_page_queue_free_mtx); + cache_was_empty = vm_radix_is_empty(&object->cache); + if (vm_radix_insert(&object->cache, m)) { + PCPU_INC(cnt.v_tfree); + m->flags |= PG_FREE; + cnt.v_free_count++; +#if VM_NRESERVLEVEL > 0 + if (!vm_reserv_free_page(m)) +#else + if (TRUE) +#endif + vm_phys_free_pages(m, 0); + vm_page_free_wakeup(); + mtx_unlock(&vm_page_queue_free_mtx); + if (object->resident_page_count == 0) + vdrop(object->handle); + return; + } m->flags |= PG_CACHED; + CTR3(KTR_DEBUG, "%s: set page %p flags %x", __func__, m, m->flags); cnt.v_cache_count++; - cache_was_empty = vm_radix_is_empty(&object->cache); - vm_radix_insert(&object->cache, m); + PCPU_INC(cnt.v_tcached); #if VM_NRESERVLEVEL > 0 if (!vm_reserv_free_page(m)) { #else @@ -2776,11 +3012,8 @@ vm_page_cowfault(vm_page_t m) pindex = m->pindex; retry_alloc: - pmap_remove_all(m); - vm_page_remove(m); - mnew = vm_page_alloc(object, pindex, VM_ALLOC_NORMAL | VM_ALLOC_NOBUSY); + mnew = vm_page_alloc(NULL, pindex, VM_ALLOC_NORMAL | VM_ALLOC_NOOBJ); if (mnew == NULL) { - vm_page_insert(m, object, pindex); vm_page_unlock(m); VM_OBJECT_WUNLOCK(object); VM_WAIT; @@ -2806,8 +3039,14 @@ vm_page_cowfault(vm_page_t m) vm_page_lock(mnew); vm_page_free(mnew); vm_page_unlock(mnew); - vm_page_insert(m, object, pindex); } else { /* clear COW & copy page */ + pmap_remove_all(m); + mnew->object = object; + if (object->memattr != VM_MEMATTR_DEFAULT && + (object->flags & OBJ_FICTITIOUS) == 0) + pmap_page_set_memattr(mnew, object->memattr); + if (vm_page_replace(mnew, object, pindex) != m) + panic("vm_page_cowfault: invalid page replacement"); if (!so_zerocp_fullpage) pmap_copy_page(m, mnew); mnew->valid = VM_PAGE_BITS_ALL; Index: /usr/src/sys/vm/vm_page.h =================================================================== --- /usr/src/sys/vm/vm_page.h (revision 253382) +++ /usr/src/sys/vm/vm_page.h (working copy) @@ -388,7 +388,7 @@ void vm_page_dequeue_locked(vm_page_t m); vm_page_t vm_page_find_least(vm_object_t, vm_pindex_t); vm_page_t vm_page_getfake(vm_paddr_t paddr, vm_memattr_t memattr); void vm_page_initfake(vm_page_t m, vm_paddr_t paddr, vm_memattr_t memattr); -void vm_page_insert (vm_page_t, vm_object_t, vm_pindex_t); +int vm_page_insert (vm_page_t, vm_object_t, vm_pindex_t); boolean_t vm_page_is_cached(vm_object_t object, vm_pindex_t pindex); vm_page_t vm_page_lookup (vm_object_t, vm_pindex_t); vm_page_t vm_page_next(vm_page_t m); @@ -398,7 +398,9 @@ void vm_page_putfake(vm_page_t m); void vm_page_readahead_finish(vm_page_t m); void vm_page_reference(vm_page_t m); void vm_page_remove (vm_page_t); -void vm_page_rename (vm_page_t, vm_object_t, vm_pindex_t); +int vm_page_rename (vm_page_t, vm_object_t, vm_pindex_t); +vm_page_t vm_page_replace(vm_page_t mnew, vm_object_t object, + vm_pindex_t pindex); void vm_page_requeue(vm_page_t m); void vm_page_requeue_locked(vm_page_t m); void vm_page_set_valid_range(vm_page_t m, int base, int size); Index: /usr/src/sys/vm/vm_radix.c =================================================================== --- /usr/src/sys/vm/vm_radix.c (revision 253382) +++ /usr/src/sys/vm/vm_radix.c (working copy) @@ -103,8 +103,7 @@ struct vm_radix_node { static uma_zone_t vm_radix_node_zone; /* - * Allocate a radix node. Pre-allocation should ensure that the request - * will always be satisfied. + * Allocate a radix node. */ static __inline struct vm_radix_node * vm_radix_node_get(vm_pindex_t owner, uint16_t count, uint16_t clevel) @@ -112,21 +111,8 @@ vm_radix_node_get(vm_pindex_t owner, uint16_t coun struct vm_radix_node *rnode; rnode = uma_zalloc(vm_radix_node_zone, M_NOWAIT); - - /* - * The required number of nodes should already be pre-allocated - * by vm_radix_prealloc(). However, UMA can hold a few nodes - * in per-CPU buckets, which will not be accessible by the - * current CPU. Thus, the allocation could return NULL when - * the pre-allocated pool is close to exhaustion. Anyway, - * in practice this should never occur because a new node - * is not always required for insert. Thus, the pre-allocated - * pool should have some extra pages that prevent this from - * becoming a problem. - */ if (rnode == NULL) - panic("%s: uma_zalloc() returned NULL for a new node", - __func__); + return (NULL); rnode->rn_owner = owner; rnode->rn_count = count; rnode->rn_clev = clevel; @@ -308,31 +294,33 @@ vm_radix_node_zone_init(void *mem, int size __unus return (0); } +#ifndef UMA_MD_SMALL_ALLOC /* - * Pre-allocate intermediate nodes from the UMA slab zone. + * Reserve the KVA necessary to satisfy the node allocation. + * This is mandatory in architectures not supporting direct + * mapping as they will need otherwise to carve into the kernel maps for + * every node allocation, resulting into deadlocks for consumers already + * working with kernel maps. */ static void -vm_radix_prealloc(void *arg __unused) +vm_radix_reserve_kva(void *arg __unused) { - int nodes; /* * Calculate the number of reserved nodes, discounting the pages that * are needed to store them. */ - nodes = ((vm_paddr_t)cnt.v_page_count * PAGE_SIZE) / (PAGE_SIZE + - sizeof(struct vm_radix_node)); - if (!uma_zone_reserve_kva(vm_radix_node_zone, nodes)) + if (!uma_zone_reserve_kva(vm_radix_node_zone, + ((vm_paddr_t)cnt.v_page_count * PAGE_SIZE) / (PAGE_SIZE + + sizeof(struct vm_radix_node)))) panic("%s: unable to create new zone", __func__); - uma_prealloc(vm_radix_node_zone, nodes); } -SYSINIT(vm_radix_prealloc, SI_SUB_KMEM, SI_ORDER_SECOND, vm_radix_prealloc, - NULL); +SYSINIT(vm_radix_reserve_kva, SI_SUB_KMEM, SI_ORDER_SECOND, + vm_radix_reserve_kva, NULL); +#endif /* * Initialize the UMA slab zone. - * Until vm_radix_prealloc() is called, the zone will be served by the - * UMA boot-time pre-allocated pool of pages. */ void vm_radix_init(void) @@ -345,8 +333,7 @@ vm_radix_init(void) #else NULL, #endif - vm_radix_node_zone_init, NULL, VM_RADIX_PAD, UMA_ZONE_VM | - UMA_ZONE_NOFREE); + vm_radix_node_zone_init, NULL, VM_RADIX_PAD, UMA_ZONE_VM); } /* @@ -353,7 +340,7 @@ vm_radix_init(void) * Inserts the key-value pair into the trie. * Panics if the key already exists. */ -void +int vm_radix_insert(struct vm_radix *rtree, vm_page_t page) { vm_pindex_t index, newind; @@ -372,7 +359,7 @@ vm_radix_insert(struct vm_radix *rtree, vm_page_t rnode = vm_radix_getroot(rtree); if (rnode == NULL) { rtree->rt_root = (uintptr_t)page | VM_RADIX_ISLEAF; - return; + return (0); } parentp = (void **)&rtree->rt_root; for (;;) { @@ -384,10 +371,12 @@ vm_radix_insert(struct vm_radix *rtree, vm_page_t clev = vm_radix_keydiff(m->pindex, index); tmp = vm_radix_node_get(vm_radix_trimkey(index, clev + 1), 2, clev); + if (tmp == NULL) + return (ENOMEM); *parentp = tmp; vm_radix_addpage(tmp, index, clev, page); vm_radix_addpage(tmp, m->pindex, clev, m); - return; + return (0); } else if (vm_radix_keybarr(rnode, index)) break; slot = vm_radix_slot(index, rnode->rn_clev); @@ -394,7 +383,7 @@ vm_radix_insert(struct vm_radix *rtree, vm_page_t if (rnode->rn_child[slot] == NULL) { rnode->rn_count++; vm_radix_addpage(rnode, index, rnode->rn_clev, page); - return; + return (0); } parentp = &rnode->rn_child[slot]; rnode = rnode->rn_child[slot]; @@ -409,10 +398,13 @@ vm_radix_insert(struct vm_radix *rtree, vm_page_t clev = vm_radix_keydiff(newind, index); tmp = vm_radix_node_get(vm_radix_trimkey(index, clev + 1), 2, clev); + if (tmp == NULL) + return (ENOMEM); *parentp = tmp; vm_radix_addpage(tmp, index, clev, page); slot = vm_radix_slot(newind, clev); tmp->rn_child[slot] = rnode; + return (0); } /* @@ -739,6 +731,51 @@ vm_radix_reclaim_allnodes(struct vm_radix *rtree) vm_radix_reclaim_allnodes_int(root); } +/* + * Replace an existing page into the trie with another one. + * Panics if the replacing page is not present or if the new page has an + * invalid key. + */ +vm_page_t +vm_radix_replace(struct vm_radix *rtree, vm_page_t newpage, vm_pindex_t index) +{ + struct vm_radix_node *rnode; + vm_page_t m; + int slot; + + KASSERT(newpage->pindex == index, ("%s: newpage index invalid", + __func__)); + + rnode = vm_radix_getroot(rtree); + if (rnode == NULL) + panic("%s: replacing page on an empty trie", __func__); + if (vm_radix_isleaf(rnode)) { + m = vm_radix_topage(rnode); + if (m->pindex != index) + panic("%s: original replacing root key not found", + __func__); + rtree->rt_root = (uintptr_t)newpage | VM_RADIX_ISLEAF; + return (m); + } + for (;;) { + slot = vm_radix_slot(index, rnode->rn_clev); + if (vm_radix_isleaf(rnode->rn_child[slot])) { + m = vm_radix_topage(rnode->rn_child[slot]); + if (m->pindex == index) { + rnode->rn_child[slot] = + (void *)((uintptr_t)newpage | + VM_RADIX_ISLEAF); + return (m); + } else + break; + } else if (rnode->rn_child[slot] == NULL || + vm_radix_keybarr(rnode->rn_child[slot], index)) + break; + rnode = rnode->rn_child[slot]; + } + panic("%s: original replacing page not found", __func__); +} + #ifdef DDB /* * Show details about the given radix node. Index: /usr/src/sys/vm/vm_radix.h =================================================================== --- /usr/src/sys/vm/vm_radix.h (revision 253382) +++ /usr/src/sys/vm/vm_radix.h (working copy) @@ -36,12 +36,14 @@ #ifdef _KERNEL void vm_radix_init(void); -void vm_radix_insert(struct vm_radix *rtree, vm_page_t page); +int vm_radix_insert(struct vm_radix *rtree, vm_page_t page); vm_page_t vm_radix_lookup(struct vm_radix *rtree, vm_pindex_t index); vm_page_t vm_radix_lookup_ge(struct vm_radix *rtree, vm_pindex_t index); vm_page_t vm_radix_lookup_le(struct vm_radix *rtree, vm_pindex_t index); void vm_radix_reclaim_allnodes(struct vm_radix *rtree); void vm_radix_remove(struct vm_radix *rtree, vm_pindex_t index); +vm_page_t vm_radix_replace(struct vm_radix *rtree, vm_page_t newpage, + vm_pindex_t index); #endif /* _KERNEL */ #endif /* !_VM_RADIX_H_ */ -- Test scenario: fts.sh