Delivered-To: carpeddiem+freebsd@gmail.com Received: by 10.107.170.154 with SMTP id g26csp1038434ioj; Tue, 11 Apr 2017 02:22:33 -0700 (PDT) X-Received: by 10.55.5.17 with SMTP id 17mr40409821qkf.279.1491902553884; Tue, 11 Apr 2017 02:22:33 -0700 (PDT) Return-Path: Received: from mx2.freebsd.org (mx2.freebsd.org. [8.8.178.116]) by mx.google.com with ESMTPS id x3si7810080qtb.203.2017.04.11.02.22.33 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 11 Apr 2017 02:22:33 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning kostikbel@gmail.com does not designate 8.8.178.116 as permitted sender) client-ip=8.8.178.116; Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning kostikbel@gmail.com does not designate 8.8.178.116 as permitted sender) smtp.mailfrom=kostikbel@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mx1.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx2.freebsd.org (Postfix) with ESMTPS id 1A99F6BE74 for ; Tue, 11 Apr 2017 09:22:33 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from freefall.freebsd.org (freefall.freebsd.org [96.47.72.132]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "freefall.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id CEE57A05 for ; Tue, 11 Apr 2017 09:22:32 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: by freefall.freebsd.org (Postfix) id 0346B23F; Tue, 11 Apr 2017 09:22:32 +0000 (UTC) Delivered-To: emaste@localmail.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mx1.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by freefall.freebsd.org (Postfix) with ESMTPS id A31B423D; Tue, 11 Apr 2017 09:22:31 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 5A8789F9; Tue, 11 Apr 2017 09:22:30 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id v3B9MJ1F042320 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Tue, 11 Apr 2017 12:22:19 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua v3B9MJ1F042320 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id v3B9MJT7042319; Tue, 11 Apr 2017 12:22:19 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 11 Apr 2017 12:22:19 +0300 From: Konstantin Belousov To: Ed Maste Cc: alc@freebsd.org, markj@freebsd.org Subject: Re: pthread_setspecific crash Message-ID: <20170411092218.GO1788@kib.kiev.ua> References: <20170408065333.GC1788@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170408065333.GC1788@kib.kiev.ua> User-Agent: Mutt/1.8.0 (2017-02-23) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home I think I figured out the situation. For the context, below is the excerpt from the yesterday ktrace.out which tracks the problematic syscalls: 3959 100687 ld.lld CALL munmap(0x3800000,0x2000) 3959 100621 ld.lld CALL mmap(0x7fffdf1f7000,0x201000,0x7,0x400,0xffffffff,0) 3959 100687 ld.lld RET munmap 0 3959 100621 ld.lld RET mmap 58720256/0x3800000 3959 101369 ld.lld CALL mmap(0,0x1000,0x3,0x1002,0xffffffff,0) 3959 100621 ld.lld CALL mprotect(0x3800000,0x1000,0) 3959 101369 ld.lld RET mmap 58720256/0x3800000 3959 100621 ld.lld RET mprotect 0 Thread 100687 unmaps something at the address, then thread 100621 does mmap(MAP_STACK) which returns the same address, and in parallel thread 101369 does mmap(MAP_PRIVATE|MAP_ANON) which returns the same address again. Later, thread 100621 uses mprotect() to set up the guard page at the very bottom of the returned stack, which tricks the thread 101369 into fault when accessing the address. Problem is that the thread 100621 (which sets up the stack for a new thread) really did not allocated map entry at the address. The current behaviour of mmap(MAP_STACK) is to only allocate the range of [top - sgrowsiz, top), the range [top - stacksize, top - sgrowsiz) is kept as the space where the stack is extended. The randomizer for new allocations is good enough to sometimes select the reserved grow space of some stack and cause the issue. It is not specific to the ASLR patch, nothing prevents the problem from appearing on the stock system as well. It is just that chances are negligible due to the allocator selecting consequent increasing addresses to look at the free address space right now. The whole stack grow idea is somewhat strange for modern times, because I do not see any use for it besides possibly reporting smaller VSS than it would be if the stack map entry is allocated in a whole at the request time. The pages are allocated lazily anyway. In fact, it would make a difference for mlockall() processes, since currently such processes are causing page faults and might be put to sleep for a page allocation when stack needs to grow. IMO this is not right. Below is the conservative update to the ASLR patch, to pre-allocate the map entry for the whole stack range when ASLR is used for the process. Non-ASLR processes grow stack as before. diff --git a/sys/amd64/amd64/elf_machdep.c b/sys/amd64/amd64/elf_machdep.c index 1c3460eb031..d2c52ad373d 100644 --- a/sys/amd64/amd64/elf_machdep.c +++ b/sys/amd64/amd64/elf_machdep.c @@ -72,7 +72,8 @@ struct sysentvec elf64_freebsd_sysvec = { .sv_setregs = exec_setregs, .sv_fixlimit = NULL, .sv_maxssiz = NULL, - .sv_flags = SV_ABI_FREEBSD | SV_LP64 | SV_SHP | SV_TIMEKEEP, + .sv_flags = SV_ABI_FREEBSD | SV_ASLR | SV_LP64 | SV_SHP | + SV_TIMEKEEP, .sv_set_syscall_retval = cpu_set_syscall_retval, .sv_fetch_syscall_args = cpu_fetch_syscall_args, .sv_syscallnames = syscallnames, diff --git a/sys/arm/arm/elf_machdep.c b/sys/arm/arm/elf_machdep.c index a962ee882e1..bfd84b6783c 100644 --- a/sys/arm/arm/elf_machdep.c +++ b/sys/arm/arm/elf_machdep.c @@ -75,7 +75,7 @@ struct sysentvec elf32_freebsd_sysvec = { .sv_maxssiz = NULL, .sv_flags = #if __ARM_ARCH >= 6 - SV_SHP | SV_TIMEKEEP | + SV_ASLR | SV_SHP | SV_TIMEKEEP | #endif SV_ABI_FREEBSD | SV_ILP32, .sv_set_syscall_retval = cpu_set_syscall_retval, diff --git a/sys/compat/freebsd32/freebsd32_misc.c b/sys/compat/freebsd32/freebsd32_misc.c index 08a072fbfa2..9ef028aa00e 100644 --- a/sys/compat/freebsd32/freebsd32_misc.c +++ b/sys/compat/freebsd32/freebsd32_misc.c @@ -3011,6 +3011,7 @@ freebsd32_procctl(struct thread *td, struct freebsd32_procctl_args *uap) int error, error1, flags; switch (uap->com) { + case PROC_ASLR_CTL: case PROC_SPROTECT: case PROC_TRACE_CTL: case PROC_TRAPCAP_CTL: @@ -3042,6 +3043,7 @@ freebsd32_procctl(struct thread *td, struct freebsd32_procctl_args *uap) return (error); data = &x.rk; break; + case PROC_ASLR_STATUS: case PROC_TRACE_STATUS: case PROC_TRAPCAP_STATUS: data = &flags; @@ -3061,6 +3063,7 @@ freebsd32_procctl(struct thread *td, struct freebsd32_procctl_args *uap) if (error == 0) error = error1; break; + case PROC_ASLR_STATUS: case PROC_TRACE_STATUS: case PROC_TRAPCAP_STATUS: if (error == 0) diff --git a/sys/compat/ia32/ia32_sysvec.c b/sys/compat/ia32/ia32_sysvec.c index b6faf86def3..ffe7ebf16a4 100644 --- a/sys/compat/ia32/ia32_sysvec.c +++ b/sys/compat/ia32/ia32_sysvec.c @@ -120,7 +120,7 @@ struct sysentvec ia32_freebsd_sysvec = { .sv_setregs = ia32_setregs, .sv_fixlimit = ia32_fixlimit, .sv_maxssiz = &ia32_maxssiz, - .sv_flags = SV_ABI_FREEBSD | SV_IA32 | SV_ILP32 | + .sv_flags = SV_ABI_FREEBSD | SV_ASLR | SV_IA32 | SV_ILP32 | SV_SHP | SV_TIMEKEEP, .sv_set_syscall_retval = ia32_set_syscall_retval, .sv_fetch_syscall_args = ia32_fetch_syscall_args, diff --git a/sys/i386/i386/elf_machdep.c b/sys/i386/i386/elf_machdep.c index fcac38a00b4..d56d96a0238 100644 --- a/sys/i386/i386/elf_machdep.c +++ b/sys/i386/i386/elf_machdep.c @@ -74,8 +74,8 @@ struct sysentvec elf32_freebsd_sysvec = { .sv_setregs = exec_setregs, .sv_fixlimit = NULL, .sv_maxssiz = NULL, - .sv_flags = SV_ABI_FREEBSD | SV_IA32 | SV_ILP32 | SV_SHP | - SV_TIMEKEEP, + .sv_flags = SV_ABI_FREEBSD | SV_ASLR | SV_IA32 | SV_ILP32 | + SV_SHP | SV_TIMEKEEP, .sv_set_syscall_retval = cpu_set_syscall_retval, .sv_fetch_syscall_args = cpu_fetch_syscall_args, .sv_syscallnames = syscallnames, diff --git a/sys/kern/imgact_elf.c b/sys/kern/imgact_elf.c index dedc3660e17..f5be5b7061d 100644 --- a/sys/kern/imgact_elf.c +++ b/sys/kern/imgact_elf.c @@ -137,6 +137,23 @@ SYSCTL_INT(_kern_elf32, OID_AUTO, read_exec, CTLFLAG_RW, &i386_read_exec, 0, #endif #endif +static int __elfN(aslr_enabled) = 1; +SYSCTL_INT(__CONCAT(_kern_elf, __ELF_WORD_SIZE), OID_AUTO, + aslr_enabled, CTLFLAG_RWTUN, &__elfN(aslr_enabled), 0, + __XSTRING(__CONCAT(ELF, __ELF_WORD_SIZE)) + ": enable address map randomization"); + +static int __elfN(pie_aslr_enabled) = 1; +SYSCTL_INT(__CONCAT(_kern_elf, __ELF_WORD_SIZE), OID_AUTO, + pie_aslr_enabled, CTLFLAG_RWTUN, &__elfN(pie_aslr_enabled), 0, + __XSTRING(__CONCAT(ELF, __ELF_WORD_SIZE)) + ": enable address map randomization for PIE binaries"); + +static int __elfN(aslr_honor_sbrk) = 0; +SYSCTL_INT(__CONCAT(_kern_elf, __ELF_WORD_SIZE), OID_AUTO, + aslr_honor_sbrk, CTLFLAG_RW, &__elfN(aslr_honor_sbrk), 0, + __XSTRING(__CONCAT(ELF, __ELF_WORD_SIZE)) ": assume sbrk is used"); + static Elf_Brandinfo *elf_brand_list[MAX_BRANDS]; #define trunc_page_ps(va, ps) rounddown2(va, ps) @@ -770,6 +787,30 @@ fail: return (error); } +static u_long +__CONCAT(rnd_, __elfN(base))(u_long base, u_long minv, u_long maxv, + u_int align) +{ + u_long rbase, res; + + arc4rand(&rbase, sizeof(rbase), 0); + res = base + rbase % (maxv - minv); + res &= ~((u_long)align - 1); + KASSERT(res >= base, + ("res %#lx < base %#lx, minv %#lx maxv %#lx rbase %#lx", + res, base, minv, maxv, rbase)); + KASSERT(res < maxv, + ("res %#lx > maxv %#lx, minv %#lx base %#lx rbase %#lx", + res, maxv, minv, base, rbase)); + return (res); +} + +/* + * Impossible et_dyn_addr initial value indicating that the real base + * must be calculated later with some randomization applied. + */ +#define ET_DYN_ADDR_RAND 1 + static int __CONCAT(exec_, __elfN(imgact))(struct image_params *imgp) { @@ -778,6 +819,7 @@ __CONCAT(exec_, __elfN(imgact))(struct image_params *imgp) const Elf_Phdr *phdr; Elf_Auxargs *elf_auxargs; struct vmspace *vmspace; + vm_map_t map; const char *err_str, *newinterp; char *interp, *interp_buf, *path; Elf_Brandinfo *brand_info; @@ -785,6 +827,7 @@ __CONCAT(exec_, __elfN(imgact))(struct image_params *imgp) vm_prot_t prot; u_long text_size, data_size, total_size, text_addr, data_addr; u_long seg_size, seg_addr, addr, baddr, et_dyn_addr, entry, proghdr; + u_long maxalign, mapsz, maxv; int32_t osrel; int error, i, n, interp_name_len, have_interp; @@ -826,12 +869,17 @@ __CONCAT(exec_, __elfN(imgact))(struct image_params *imgp) err_str = newinterp = NULL; interp = interp_buf = NULL; td = curthread; + maxalign = PAGE_SIZE; + mapsz = 0; for (i = 0; i < hdr->e_phnum; i++) { switch (phdr[i].p_type) { case PT_LOAD: if (n == 0) baddr = phdr[i].p_vaddr; + if (phdr[i].p_align > maxalign) + maxalign = phdr[i].p_align; + mapsz += phdr[i].p_memsz; n++; break; case PT_INTERP: @@ -885,6 +933,7 @@ __CONCAT(exec_, __elfN(imgact))(struct image_params *imgp) error = ENOEXEC; goto ret; } + sv = brand_info->sysvec; et_dyn_addr = 0; if (hdr->e_type == ET_DYN) { if ((brand_info->flags & BI_CAN_EXEC_DYN) == 0) { @@ -896,10 +945,17 @@ __CONCAT(exec_, __elfN(imgact))(struct image_params *imgp) * Honour the base load address from the dso if it is * non-zero for some reason. */ - if (baddr == 0) - et_dyn_addr = ET_DYN_LOAD_ADDR; + if (baddr == 0) { + if ((sv->sv_flags & SV_ASLR) == 0) + et_dyn_addr = ET_DYN_LOAD_ADDR; + else if ((__elfN(pie_aslr_enabled) && + (imgp->proc->p_flag2 & P2_ASLR_DISABLE) == 0) || + (imgp->proc->p_flag2 & P2_ASLR_ENABLE) != 0) + et_dyn_addr = ET_DYN_ADDR_RAND; + else + et_dyn_addr = ET_DYN_LOAD_ADDR; + } } - sv = brand_info->sysvec; if (interp != NULL && brand_info->interp_newpath != NULL) newinterp = brand_info->interp_newpath; @@ -916,9 +972,53 @@ __CONCAT(exec_, __elfN(imgact))(struct image_params *imgp) */ VOP_UNLOCK(imgp->vp, 0); - error = exec_new_vmspace(imgp, sv); imgp->proc->p_sysent = sv; + /* + * Decide to enable randomization of user mappings. First, + * reset user preferences for the setid binaries. Then, + * account for the support of the randomization by the ABI, by + * user preferences, and make special treatment for PIE + * binaries. + */ + if (imgp->credential_setid) { + PROC_LOCK(imgp->proc); + imgp->proc->p_flag2 &= ~(P2_ASLR_ENABLE | P2_ASLR_DISABLE); + PROC_UNLOCK(imgp->proc); + } + if ((sv->sv_flags & SV_ASLR) == 0 || + (imgp->proc->p_flag2 & P2_ASLR_DISABLE) != 0) { + KASSERT(et_dyn_addr != ET_DYN_ADDR_RAND, + ("et_dyn_addr == RAND and !ASLR")); + } else if ((imgp->proc->p_flag2 & P2_ASLR_ENABLE) != 0 || + (__elfN(aslr_enabled) && hdr->e_type == ET_EXEC) || + et_dyn_addr == ET_DYN_ADDR_RAND) { + imgp->map_flags |= MAP_ASLR; + /* + * If user does not care about sbrk, utilize the bss + * grow region for mappings as well. We can select + * the base for the image anywere and still not suffer + * from the fragmentation. + */ + if (!__elfN(aslr_honor_sbrk) || + (imgp->proc->p_flag2 & P2_ASLR_IGNSTART) != 0) + imgp->map_flags |= MAP_ASLR_IGNSTART; + } + + error = exec_new_vmspace(imgp, sv); + vmspace = imgp->proc->p_vmspace; + map = &vmspace->vm_map; + + maxv = vm_map_max(map) - lim_max(td, RLIMIT_STACK); + if (et_dyn_addr == ET_DYN_ADDR_RAND) { + KASSERT((map->flags & MAP_ASLR) != 0, + ("ET_DYN_ADDR_RAND but !MAP_ASLR")); + et_dyn_addr = __CONCAT(rnd_, __elfN(base))(vm_map_min(map), + vm_map_min(map) + mapsz + lim_max(td, RLIMIT_DATA), + /* reserve half of the address space to interpreter */ + maxv / 2, 1UL << flsl(maxalign)); + } + vn_lock(imgp->vp, LK_EXCLUSIVE | LK_RETRY); if (error != 0) goto ret; @@ -1010,7 +1110,6 @@ __CONCAT(exec_, __elfN(imgact))(struct image_params *imgp) goto ret; } - vmspace = imgp->proc->p_vmspace; vmspace->vm_tsize = text_size >> PAGE_SHIFT; vmspace->vm_taddr = (caddr_t)(uintptr_t)text_addr; vmspace->vm_dsize = data_size >> PAGE_SHIFT; @@ -1031,6 +1130,11 @@ __CONCAT(exec_, __elfN(imgact))(struct image_params *imgp) if (interp != NULL) { have_interp = FALSE; VOP_UNLOCK(imgp->vp, 0); + if ((map->flags & MAP_ASLR) != 0) { + addr = __CONCAT(rnd_, __elfN(base))(addr, addr, + /* Assume that interpeter fits into 1/4 of AS */ + (maxv + addr) / 2, PAGE_SIZE); + } if (brand_info->emul_path != NULL && brand_info->emul_path[0] != '\0') { path = malloc(MAXPATHLEN, M_TEMP, M_WAITOK); diff --git a/sys/kern/kern_exec.c b/sys/kern/kern_exec.c index 7b5b0786dec..f628751eddf 100644 --- a/sys/kern/kern_exec.c +++ b/sys/kern/kern_exec.c @@ -1097,6 +1097,7 @@ exec_new_vmspace(imgp, sv) shmexit(vmspace); pmap_remove_pages(vmspace_pmap(vmspace)); vm_map_remove(map, vm_map_min(map), vm_map_max(map)); + map->flags &= ~(MAP_ASLR | MAP_ASLR_IGNSTART); } else { error = vmspace_exec(p, sv_minuser, sv->sv_maxuser); if (error) @@ -1104,6 +1105,7 @@ exec_new_vmspace(imgp, sv) vmspace = p->p_vmspace; map = &vmspace->vm_map; } + map->flags |= imgp->map_flags; /* Map a shared page */ obj = sv->sv_shared_page_obj; diff --git a/sys/kern/kern_fork.c b/sys/kern/kern_fork.c index 92bbcd72cea..0810deca613 100644 --- a/sys/kern/kern_fork.c +++ b/sys/kern/kern_fork.c @@ -497,7 +497,8 @@ do_fork(struct thread *td, struct fork_req *fr, struct proc *p2, struct thread * * Increase reference counts on shared objects. */ p2->p_flag = P_INMEM; - p2->p_flag2 = p1->p_flag2 & (P2_NOTRACE | P2_NOTRACE_EXEC | P2_TRAPCAP); + p2->p_flag2 = p1->p_flag2 & (P2_ASLR_DISABLE | P2_ASLR_ENABLE | + P2_ASLR_IGNSTART | P2_NOTRACE | P2_NOTRACE_EXEC | P2_TRAPCAP); p2->p_swtick = ticks; if (p1->p_flag & P_PROFIL) startprofclock(p2); diff --git a/sys/kern/kern_procctl.c b/sys/kern/kern_procctl.c index e8751701363..f4b79ea334b 100644 --- a/sys/kern/kern_procctl.c +++ b/sys/kern/kern_procctl.c @@ -43,6 +43,11 @@ __FBSDID("$FreeBSD$"); #include #include +#include +#include +#include +#include + static int protect_setchild(struct thread *td, struct proc *p, int flags) { @@ -364,6 +369,62 @@ trapcap_status(struct thread *td, struct proc *p, int *data) return (0); } +static int +aslr_ctl(struct thread *td, struct proc *p, int state) +{ + + PROC_LOCK_ASSERT(p, MA_OWNED); + + switch (state) { + case PROC_ASLR_FORCE_ENABLE: + p->p_flag2 &= ~P2_ASLR_DISABLE; + p->p_flag2 |= P2_ASLR_ENABLE; + break; + case PROC_ASLR_FORCE_DISABLE: + p->p_flag2 |= P2_ASLR_DISABLE; + p->p_flag2 &= ~P2_ASLR_ENABLE; + break; + case PROC_ASLR_NOFORCE: + p->p_flag2 &= ~(P2_ASLR_ENABLE | P2_ASLR_DISABLE); + break; + default: + return (EINVAL); + } + return (0); +} + +static int +aslr_status(struct thread *td, struct proc *p, int *data) +{ + struct vmspace *vm; + int d; + + switch (p->p_flag2 & (P2_ASLR_ENABLE | P2_ASLR_DISABLE)) { + case 0: + d = PROC_ASLR_NOFORCE; + break; + case P2_ASLR_ENABLE: + d = PROC_ASLR_FORCE_ENABLE; + break; + case P2_ASLR_DISABLE: + d = PROC_ASLR_FORCE_DISABLE; + break; + } + if ((p->p_flag & P_WEXIT) == 0) { + _PHOLD(p); + PROC_UNLOCK(p); + vm = vmspace_acquire_ref(p); + if (vm != NULL && (vm->vm_map.flags & MAP_ASLR) != 0) { + d |= PROC_ASLR_ACTIVE; + vmspace_free(vm); + } + PROC_LOCK(p); + _PRELE(p); + } + *data = d; + return (0); +} + #ifndef _SYS_SYSPROTO_H_ struct procctl_args { idtype_t idtype; @@ -385,6 +446,7 @@ sys_procctl(struct thread *td, struct procctl_args *uap) int error, error1, flags; switch (uap->com) { + case PROC_ASLR_CTL: case PROC_SPROTECT: case PROC_TRACE_CTL: case PROC_TRAPCAP_CTL: @@ -414,6 +476,7 @@ sys_procctl(struct thread *td, struct procctl_args *uap) return (error); data = &x.rk; break; + case PROC_ASLR_STATUS: case PROC_TRACE_STATUS: case PROC_TRAPCAP_STATUS: data = &flags; @@ -432,6 +495,7 @@ sys_procctl(struct thread *td, struct procctl_args *uap) if (error == 0) error = error1; break; + case PROC_ASLR_STATUS: case PROC_TRACE_STATUS: case PROC_TRAPCAP_STATUS: if (error == 0) @@ -447,6 +511,10 @@ kern_procctl_single(struct thread *td, struct proc *p, int com, void *data) PROC_LOCK_ASSERT(p, MA_OWNED); switch (com) { + case PROC_ASLR_CTL: + return (aslr_ctl(td, p, *(int *)data)); + case PROC_ASLR_STATUS: + return (aslr_status(td, p, data)); case PROC_SPROTECT: return (protect_set(td, p, *(int *)data)); case PROC_REAP_ACQUIRE: @@ -481,6 +549,8 @@ kern_procctl(struct thread *td, idtype_t idtype, id_t id, int com, void *data) bool tree_locked; switch (com) { + case PROC_ASLR_CTL: + case PROC_ASLR_STATUS: case PROC_REAP_ACQUIRE: case PROC_REAP_RELEASE: case PROC_REAP_STATUS: @@ -507,6 +577,8 @@ kern_procctl(struct thread *td, idtype_t idtype, id_t id, int com, void *data) sx_xlock(&proctree_lock); tree_locked = true; break; + case PROC_ASLR_CTL: + case PROC_ASLR_STATUS: case PROC_TRACE_STATUS: case PROC_TRAPCAP_STATUS: tree_locked = false; diff --git a/sys/sys/imgact.h b/sys/sys/imgact.h index 6970a8e4fd4..2db09cbe14c 100644 --- a/sys/sys/imgact.h +++ b/sys/sys/imgact.h @@ -87,6 +87,7 @@ struct image_params { u_long stack_sz; struct ucred *newcred; /* new credentials if changing */ bool credential_setid; /* true if becoming setid */ + u_int map_flags; }; #ifdef _KERNEL diff --git a/sys/sys/proc.h b/sys/sys/proc.h index 0362552cf92..8ea31e0ea6e 100644 --- a/sys/sys/proc.h +++ b/sys/sys/proc.h @@ -729,6 +729,9 @@ struct proc { #define P2_AST_SU 0x00000008 /* Handles SU ast for kthreads. */ #define P2_PTRACE_FSTP 0x00000010 /* SIGSTOP from PT_ATTACH not yet handled. */ #define P2_TRAPCAP 0x00000020 /* SIGTRAP on ENOTCAPABLE */ +#define P2_ASLR_ENABLE 0x00000040 /* Force enable ASLR. */ +#define P2_ASLR_DISABLE 0x00000080 /* Force disable ASLR. */ +#define P2_ASLR_IGNSTART 0x00000100 /* Enable ASLR to consume sbrk area. */ /* Flags protected by proctree_lock, kept in p_treeflags. */ #define P_TREE_ORPHANED 0x00000001 /* Reparented, on orphan list */ diff --git a/sys/sys/procctl.h b/sys/sys/procctl.h index 53bb6caa66f..c59561fa422 100644 --- a/sys/sys/procctl.h +++ b/sys/sys/procctl.h @@ -49,6 +49,8 @@ #define PROC_TRACE_STATUS 8 /* query tracing status */ #define PROC_TRAPCAP_CTL 9 /* trap capability errors */ #define PROC_TRAPCAP_STATUS 10 /* query trap capability status */ +#define PROC_ASLR_CTL 11 /* en/dis ASLR */ +#define PROC_ASLR_STATUS 12 /* query ASLR status */ /* Operations for PROC_SPROTECT (passed in integer arg). */ #define PPROT_OP(x) ((x) & 0xf) @@ -111,6 +113,11 @@ struct procctl_reaper_kill { #define PROC_TRAPCAP_CTL_ENABLE 1 #define PROC_TRAPCAP_CTL_DISABLE 2 +#define PROC_ASLR_FORCE_ENABLE 1 +#define PROC_ASLR_FORCE_DISABLE 2 +#define PROC_ASLR_NOFORCE 3 +#define PROC_ASLR_ACTIVE 0x80000000 + #ifndef _KERNEL __BEGIN_DECLS int procctl(idtype_t, id_t, int, void *); diff --git a/sys/sys/sysent.h b/sys/sys/sysent.h index 643717603fd..fdf9d332d73 100644 --- a/sys/sys/sysent.h +++ b/sys/sys/sysent.h @@ -139,6 +139,7 @@ struct sysentvec { #define SV_SHP 0x010000 /* Shared page. */ #define SV_CAPSICUM 0x020000 /* Force cap_enter() on startup. */ #define SV_TIMEKEEP 0x040000 /* Shared page timehands. */ +#define SV_ASLR 0x080000 /* ASLR allowed. */ #define SV_ABI_MASK 0xff #define SV_ABI_ERRNO(p, e) ((p)->p_sysent->sv_errsize <= 0 ? e : \ diff --git a/sys/vm/vm_map.c b/sys/vm/vm_map.c index c6fc5d30e79..0e8dc153473 100644 --- a/sys/vm/vm_map.c +++ b/sys/vm/vm_map.c @@ -1466,6 +1466,20 @@ vm_map_fixed(vm_map_t map, vm_object_t object, vm_ooffset_t offset, return (result); } +static const int aslr_pages_rnd_64[2] = {0x1000, 0x10}; +static const int aslr_pages_rnd_32[2] = {0x100, 0x4}; + +static int aslr_sloppiness = 5; +SYSCTL_INT(_vm, OID_AUTO, aslr_sloppiness, CTLFLAG_RW, &aslr_sloppiness, 0, + ""); + +static int aslr_collapse_anon = 1; +SYSCTL_INT(_vm, OID_AUTO, aslr_collapse_anon, CTLFLAG_RW, + &aslr_collapse_anon, 0, + ""); + +#define MAP_32BIT_MAX_ADDR ((vm_offset_t)1 << 31) + /* * vm_map_find finds an unallocated region in the target address * map with the given length. The search is defined to be @@ -1481,8 +1495,11 @@ vm_map_find(vm_map_t map, vm_object_t object, vm_ooffset_t offset, vm_size_t length, vm_offset_t max_addr, int find_space, vm_prot_t prot, vm_prot_t max, int cow) { - vm_offset_t alignment, initial_addr, start; - int result; + vm_map_entry_t prev_entry; + vm_offset_t alignment, addr_save, start, start1, rand_max, re; + const int *aslr_pages_rnd; + int result, do_aslr, pidx; + bool en_aslr, anon; KASSERT((cow & (MAP_STACK_GROWS_DOWN | MAP_STACK_GROWS_UP)) == 0 || object == NULL, @@ -1495,21 +1512,86 @@ vm_map_find(vm_map_t map, vm_object_t object, vm_ooffset_t offset, alignment = (vm_offset_t)1 << (find_space >> 8); } else alignment = 0; - initial_addr = *addr; + do_aslr = (map->flags & MAP_ASLR) != 0 ? aslr_sloppiness : 0; + en_aslr = do_aslr != 0; + anon = object == NULL && (cow & (MAP_INHERIT_SHARE | + MAP_STACK_GROWS_UP | MAP_STACK_GROWS_DOWN)) == 0 && + prot != PROT_NONE && aslr_collapse_anon; + addr_save = *addr; + if (en_aslr) { + if (vm_map_max(map) > MAP_32BIT_MAX_ADDR && + (max_addr == 0 || max_addr > MAP_32BIT_MAX_ADDR)) + aslr_pages_rnd = aslr_pages_rnd_64; + else + aslr_pages_rnd = aslr_pages_rnd_32; + if (find_space != VMFS_NO_SPACE && (map->flags & + MAP_ASLR_IGNSTART) != 0) { + start = anon ? map->anon_loc : vm_map_min(map); + } else { + start = anon && *addr == 0 ? map->anon_loc : addr_save; + } + } else { + start = addr_save; + } + start1 = start; /* for again_any_space restart */ again: - start = initial_addr; + if (en_aslr && (do_aslr == 0 || (anon && + do_aslr == aslr_sloppiness - 1))) { + /* + * We are either at the last aslr iteration, or anon + * coalescing failed on the first try. Retry with + * free run. + */ + if ((map->flags & MAP_ASLR_IGNSTART) != 0) + start = vm_map_min(map); + else + start = addr_save; + } +again_any_space: vm_map_lock(map); do { if (find_space != VMFS_NO_SPACE) { if (vm_map_findspace(map, start, length, addr) || (max_addr != 0 && *addr + length > max_addr)) { vm_map_unlock(map); + if (do_aslr > 0) { + do_aslr--; + goto again; + } if (find_space == VMFS_OPTIMAL_SPACE) { find_space = VMFS_ANY_SPACE; - goto again; + start = start1; + goto again_any_space; } return (KERN_NO_SPACE); } + /* + * The R step for ASLR. But skip it if we are + * trying to coalesce anon memory request. + */ + if (do_aslr > 0 && + !(anon && do_aslr == aslr_sloppiness)) { + vm_map_lookup_entry(map, *addr, &prev_entry); + if (MAXPAGESIZES > 1 && pagesizes[1] != 0 && + (find_space == VMFS_SUPER_SPACE || + find_space == VMFS_OPTIMAL_SPACE)) + pidx = 1; + else + pidx = 0; + re = prev_entry->next == &map->header ? + map->max_offset : prev_entry->next->start; + rand_max = ((max_addr != 0 && re > max_addr) ? + max_addr : re) - *addr - length; + rand_max /= pagesizes[pidx]; + if (rand_max < aslr_pages_rnd[pidx]) { + vm_map_unlock(map); + start = re; + do_aslr--; + goto again; + } + *addr += (arc4random() % rand_max) * + pagesizes[pidx]; + } switch (find_space) { case VMFS_SUPER_SPACE: case VMFS_OPTIMAL_SPACE: @@ -1525,7 +1607,6 @@ again: } break; } - start = *addr; } if ((cow & (MAP_STACK_GROWS_DOWN | MAP_STACK_GROWS_UP)) != 0) { @@ -1535,8 +1616,15 @@ again: result = vm_map_insert(map, object, offset, start, start + length, prot, max, cow); } + if (result != KERN_SUCCESS && do_aslr > 0) { + vm_map_unlock(map); + do_aslr--; + goto again; + } } while (result == KERN_NO_SPACE && find_space != VMFS_NO_SPACE && find_space != VMFS_ANY_SPACE); + if (result == KERN_SUCCESS && anon) + map->anon_loc = *addr + length; vm_map_unlock(map); return (result); } @@ -3055,6 +3143,9 @@ vm_map_delete(vm_map_t map, vm_offset_t start, vm_offset_t end) pmap_remove(map->pmap, entry->start, entry->end); + if (entry->end == map->anon_loc) + map->anon_loc = entry->prev->end; + /* * Delete the entry only after removing all pmap * entries pointing to its pages. (Otherwise, its @@ -3496,6 +3587,17 @@ vmspace_fork(struct vmspace *vm1, vm_ooffset_t *fork_charge) return (vm2); } +static vm_size_t +vm_map_stack_initsz(vm_map_t map, vm_size_t max_ssize, vm_size_t growsize) +{ + vm_size_t init_ssize; + + init_ssize = max_ssize; + if ((map->flags & MAP_ASLR) == 0 && max_ssize > growsize) + init_ssize = growsize; + return (init_ssize); +} + int vm_map_stack(vm_map_t map, vm_offset_t addrbos, vm_size_t max_ssize, vm_prot_t prot, vm_prot_t max, int cow) @@ -3505,7 +3607,7 @@ vm_map_stack(vm_map_t map, vm_offset_t addrbos, vm_size_t max_ssize, int rv; growsize = sgrowsiz; - init_ssize = (max_ssize < growsize) ? max_ssize : growsize; + init_ssize = vm_map_stack_initsz(map, max_ssize, growsize); vm_map_lock(map); lmemlim = lim_cur(curthread, RLIMIT_MEMLOCK); vmemlim = lim_cur(curthread, RLIMIT_VMEM); @@ -3550,7 +3652,7 @@ vm_map_stack_locked(vm_map_t map, vm_offset_t addrbos, vm_size_t max_ssize, addrbos + max_ssize < addrbos) return (KERN_NO_SPACE); - init_ssize = (max_ssize < growsize) ? max_ssize : growsize; + init_ssize = vm_map_stack_initsz(map, max_ssize, growsize); /* If addr is already mapped, no go */ if (vm_map_lookup_entry(map, addrbos, &prev_entry)) diff --git a/sys/vm/vm_map.h b/sys/vm/vm_map.h index 8e8ada92dc2..1e6b101af37 100644 --- a/sys/vm/vm_map.h +++ b/sys/vm/vm_map.h @@ -190,6 +190,7 @@ struct vm_map { pmap_t pmap; /* (c) Physical map */ #define min_offset header.start /* (c) */ #define max_offset header.end /* (c) */ + vm_offset_t anon_loc; int busy; }; @@ -198,6 +199,8 @@ struct vm_map { */ #define MAP_WIREFUTURE 0x01 /* wire all future pages */ #define MAP_BUSY_WAKEUP 0x02 +#define MAP_ASLR 0x04 /* enabled ASLR */ +#define MAP_ASLR_IGNSTART 0x08 #ifdef _KERNEL static __inline vm_offset_t diff --git a/usr.bin/proccontrol/proccontrol.c b/usr.bin/proccontrol/proccontrol.c index 4cb37018c41..3c0ad53e752 100644 --- a/usr.bin/proccontrol/proccontrol.c +++ b/usr.bin/proccontrol/proccontrol.c @@ -39,6 +39,7 @@ __FBSDID("$FreeBSD$"); #include enum { + MODE_ASLR, MODE_INVALID, MODE_TRACE, MODE_TRAPCAP, @@ -62,7 +63,7 @@ static void __dead2 usage(void) { - fprintf(stderr, "Usage: proccontrol -m (trace|trapcap) [-q] " + fprintf(stderr, "Usage: proccontrol -m (aslr|trace|trapcap) [-q] " "[-s (enable|disable)] [-p pid | command]\n"); exit(1); } @@ -81,7 +82,9 @@ main(int argc, char *argv[]) while ((ch = getopt(argc, argv, "m:qs:p:")) != -1) { switch (ch) { case 'm': - if (strcmp(optarg, "trace") == 0) + if (strcmp(optarg, "aslr") == 0) + mode = MODE_ASLR; + else if (strcmp(optarg, "trace") == 0) mode = MODE_TRACE; else if (strcmp(optarg, "trapcap") == 0) mode = MODE_TRAPCAP; @@ -121,6 +124,9 @@ main(int argc, char *argv[]) if (query) { switch (mode) { + case MODE_ASLR: + error = procctl(P_PID, pid, PROC_ASLR_STATUS, &arg); + break; case MODE_TRACE: error = procctl(P_PID, pid, PROC_TRACE_STATUS, &arg); break; @@ -134,6 +140,23 @@ main(int argc, char *argv[]) if (error != 0) err(1, "procctl status"); switch (mode) { + case MODE_ASLR: + switch (arg & ~PROC_ASLR_ACTIVE) { + case PROC_ASLR_FORCE_ENABLE: + printf("force enabled"); + break; + case PROC_ASLR_FORCE_DISABLE: + printf("force disabled"); + break; + case PROC_ASLR_NOFORCE: + printf("not forced"); + break; + } + if ((arg & PROC_ASLR_ACTIVE) != 0) + printf(", active\n"); + else + printf(", not active\n"); + break; case MODE_TRACE: if (arg == -1) printf("disabled\n"); @@ -155,6 +178,11 @@ main(int argc, char *argv[]) } } else { switch (mode) { + case MODE_ASLR: + arg = enable ? PROC_ASLR_FORCE_ENABLE : + PROC_ASLR_FORCE_DISABLE; + error = procctl(P_PID, pid, PROC_ASLR_CTL, &arg); + break; case MODE_TRACE: arg = enable ? PROC_TRACE_CTL_ENABLE : PROC_TRACE_CTL_DISABLE;