This site is no longer maintained. Go to freebsd site.
The patch for FreeBSD current adds the switch to disable allocation of the anonymous memory that cannot be backed by the swap, i.e., after turning the switch, total amount of the anon memory in the system cannot exceed the swap size. Besides this, the amounts of memory allocated under real uids are tracked and can be limited (by the RLIMIT_SWAP).
Patch does this by accounting for mapped and brk-ed memory, /dev/zero, sysv shm allocated from swap (when kern.ipc.shm_use_phys = 0) and swap-based md-disks.
The current amount of the accounted swap space is exported as sysctl vm.swap_reserved (count is in bytes). The sysctl vm.overcommit controls the swap allocation policy: setting of the bit 0 to 1 denies allocation request if, after request, total reserved swap space will exceed size of configured swap.
Bit 2 allows to count non-wired physical memory as swap. This is like the swap reservation on Solaris going. Additionally, free_reserved pages (exported as vm.stats.vm.v_free_target) are never allowed to be allocated (from the userspace) to help avoid deadlocks.
Setting of bit 1 allows enforcement of the per-user RLIMIT_SWAP limits. These limits may be set in login.conf by the swapuse capability. Both /bin/sh, /bin/csh and /usr/bin/limits are patched to know about RLIMIT_SWAP.
See also tuning(7) and getrlimit(2) in the patched sources.
Patch goal is to charge for OBJT_SWAP and OBJT_DEFAULT objects. But, usually, the objects backing anonymous memory are created at the fault time (or when clipping vm_map_entry, etc), not when the entry is created. So, both vm_map_entry and vm_object got the uip field that points to the struct uidinfo. Non-null value in this field means that entry or object are charged and points to uidinfo for ruid allocated that memory. When the vm_object for vm_map_entry is created, uip is migrated from entry to object.
vm_map_insert function makes the decision should the mapped entry be charged. This may be influenced by MAP_ACC_CHARGED and MAP_ACC_NO_CHARGE flags. MAP_ACC_CHARGED means that memory is already charged by some means, MAP_ACC_NO_CHARGE forbids charging even if entry looks like is shall be. E.g., io buffers are inserted like anon memory in the kernel map.
Objects sometimes have dead pieces that will never reference pages and will not accessed by any map entry. This can happens, e.g, after vm_object_split. So, the charge field was added to the vm_object that shows how much swap is really reserved for the object.
Patch accounts for any mapping that could lead to the use of the swap space. E.g., shared anonymous memory or private mapping of the file are charged. But, executables and shared libraries have text segment mapped private readonly (see below for VM_PROT_OVERRIDE_WRITE). So, the kludge was added to not charge for private readonly mappings of the files. But, if the area is later mprotected(2) to be writable, object is charged. Beware, the mprotect(2) and ptrace(2) may return ENOMEM.
(r)fork(2) code without RFMEM flag has to be restructured to fail early if swap space cannot be reserved for parent and child (since private mappings are COW for both processes). As result, it leads to LOR in the boot (since both proc and vmspace locks are taken).
You feedback is welcome. My mail is kostikbel gmail com
Initial public version.
Removed some debug code.
Checked per-user useswap limit. VM_PROT_OVERRIDE_WRITE is charged (to the debugger owner). Serious bug in the rfork (vm_forkproc) is fixed.
Fixed (unimportant) merge conflict in procfs.
Reverted files generated from syscalls.mk. Patch refreshed against current.
Patch refreshed against current. Fixed merge conflict.
Patch refreshed against current. Wrote documentation for vm.overcommit. Moved vmspace.vm_fork_charge under #ifdef INVARIANTS. Included the patch to make libc text PIC (filled as PR i386/85242). Fixed accounting in vm_object_coalesce, that cured panics in pipe code.
Patch refreshed against current. Clarified RLIMIT_SWAP behaviour for root-owned processes. Teached tcsh about RLIMIT_SWAP. Removed debug argument from swap_re* functions. Fixed (?) accounting for stacks that grows up. Fixed accounting leak in md driver for error cases. Documented MAP_ACC_* flags in vm_map(9).
Fixed per-uid process number accounting leak on failed fork. Fixed non-deterministic wait in kern_alloc_wait (execve now returns ENOMEM instead of waiting on low memory condition). Added ability to count free physical memory as virtual swap (like Solaris). Fork no more charge some of the need-copy areas to the ruid of the forked process.
Recently, alc@ changed the code to use sf_buf_alloc() instead of vm_map_find() to do non-page aligned mapping of the sections (see rev. 1.167 of sys/kern/imgact_elf.c). Amazingly, the changes do not confict with my patch, and I do not need to refresh it. Moreover, I think that patch still applicable to RELENG_6.
Patch refreshed against current.
Patch refreshed against current. Fixed three merge conflicts. jemalloc really likes the swap !
$Id: index-overcommit.html,v 1.22 2006/06/08 11:29:18 kostik Exp $