This site is no longer maintained. Go to freebsd site.

FreeBSD overcommit enable/disable patch

The patch for FreeBSD current adds the switch to disable allocation of the anonymous memory that cannot be backed by the swap, i.e., after turning the switch, total amount of the anon memory in the system cannot exceed the swap size. Besides this, the amounts of memory allocated under real uids are tracked and can be limited (by the RLIMIT_SWAP).

Patch does this by accounting for mapped and brk-ed memory, /dev/zero, sysv shm allocated from swap (when kern.ipc.shm_use_phys = 0) and swap-based md-disks.

How-to use

The current amount of the accounted swap space is exported as sysctl vm.swap_reserved (count is in bytes). The sysctl vm.overcommit controls the swap allocation policy: setting of the bit 0 to 1 denies allocation request if, after request, total reserved swap space will exceed size of configured swap.

Bit 2 allows to count non-wired physical memory as swap. This is like the swap reservation on Solaris going. Additionally, free_reserved pages (exported as vm.stats.vm.v_free_target) are never allowed to be allocated (from the userspace) to help avoid deadlocks.

Setting of bit 1 allows enforcement of the per-user RLIMIT_SWAP limits. These limits may be set in login.conf by the swapuse capability. Both /bin/sh, /bin/csh and /usr/bin/limits are patched to know about RLIMIT_SWAP.

See also tuning(7) and getrlimit(2) in the patched sources.

Implementation overview

Patch goal is to charge for OBJT_SWAP and OBJT_DEFAULT objects. But, usually, the objects backing anonymous memory are created at the fault time (or when clipping vm_map_entry, etc), not when the entry is created. So, both vm_map_entry and vm_object got the uip field that points to the struct uidinfo. Non-null value in this field means that entry or object are charged and points to uidinfo for ruid allocated that memory. When the vm_object for vm_map_entry is created, uip is migrated from entry to object.

vm_map_insert function makes the decision should the mapped entry be charged. This may be influenced by MAP_ACC_CHARGED and MAP_ACC_NO_CHARGE flags. MAP_ACC_CHARGED means that memory is already charged by some means, MAP_ACC_NO_CHARGE forbids charging even if entry looks like is shall be. E.g., io buffers are inserted like anon memory in the kernel map.

Objects sometimes have dead pieces that will never reference pages and will not accessed by any map entry. This can happens, e.g, after vm_object_split. So, the charge field was added to the vm_object that shows how much swap is really reserved for the object.

Patch accounts for any mapping that could lead to the use of the swap space. E.g., shared anonymous memory or private mapping of the file are charged. But, executables and shared libraries have text segment mapped private readonly (see below for VM_PROT_OVERRIDE_WRITE). So, the kludge was added to not charge for private readonly mappings of the files. But, if the area is later mprotected(2) to be writable, object is charged. Beware, the mprotect(2) and ptrace(2) may return ENOMEM.

(r)fork(2) code without RFMEM flag has to be restructured to fail early if swap space cannot be reserved for parent and child (since private mappings are COW for both processes). As result, it leads to LOR in the boot (since both proc and vmspace locks are taken).


Patch is in the alpha stage. I built it for i386. Kernel boots in the qemu, and vm.swap_reserved shows reasonable numbers. When shut down to the single-user mode, swap_reserved is also reasonable. So, I hope, no obvious major accounting leak is present.

You feedback is welcome. My mail is kostikbel gmail com

$Id: index-overcommit.html,v 1.22 2006/06/08 11:29:18 kostik Exp $

Используются технологии uCoz