futex

Name

futex -- Fast Userspace Locking system call

Synopsis

	#include <linux/futex.h>
	#include <sys/time.h>
      

int sys_futex(void *futex, int op, int val, const struct timespec *timeout, void *futex2, int val3);

int sys_futex1(void *futex, int op, int val, const struct timespec *timeout);

int sys_futex2(void *futex1, int op, int val1, int val2, void *futex2, int val3);

int sys_futex3(void *futex1, int op, int val1, int val2, void *futex2);

Description

The sys_futex system call provides a method for a program to wait for a value at a given address to change, and a method to wake up anyone waiting on a particular address. While the addresses for the same memory in separate processes may not be identical, the kernel maps them interally so the same memory mapped in different locations will correspond for sys_futex calls. Futexes are typically used to implement the contended case of a lock in shared memory, as described in futex(4).

When a futex(4) operation does not finish uncontended in userspace, a call needs to be made to the kernel to arbitrate. Arbitration can either mean putting the calling process to sleep or, conversely, waking a waiting process.

Callers of this function are expected to adhere to the semantics as set out in futex(4). As these semantics involve writing non-portable assembly instructions, this in turn probably means that most users will in fact be library authors and not general application developers.

The futex and futex2 arguments need to point at aligned 32 bit integers which store the counter. The operation to execute is passed via the op parameter, along with a value val.

Because of historical reasons, the futex system call is highly overloaded. The sys_futex prototype is the canonical invocation, while sys_futex1 was the original implementation. It should be noted that there is but one system call, only the number of the parameters and their names differ.

Five operations are currently defined:

FUTEX_WAIT

This operation atomically verifies that the futex address still contains the value given, and sleeps awaiting FUTEX_WAKE on this futex address. If the timeout argument is non-NULL, its contents describe the maximum duration of the wait, which is infinite otherwise. For futex(4), this call is executed if decrementing the count gave a negative value (indicating contention), and will sleep until another process releases the futex and executes the FUTEX_WAKE operation.

Uses the sys_futex1() prototype.

FUTEX_WAKE

This operation wakes at most val processes waiting on this futex address (ie. inside FUTEX_WAIT). For futex(4), this is executed if incrementing the count showed that there were waiters, once the futex value has been set to 1 (indicating that it is available).

Uses the sys_futex1() prototype.

FUTEX_CMP_REQUEUE

Identical to FUTEX_WAKE except that any processes in excess of the number specified in val are not woken up but moved to the futex passed in futex2. The number of processes to move is capped by val2. Passing 0 makes this operation degenerate into FUTEX_WAKE.

This operation can be used to efficiently implement pthread_cond_broadcast(3) on larger SMP systems.

The value passed in val3 is compared to that of the futex, exactly like FUTEX_WAIT. For full details which are outside the scope of this manpage, see the 'Futexes are tricky' article referenced below.

Available since kernel version 2.6.7 and uses the sys_futex2() prototype.

FUTEX_REQUEUE

Identical to FUTEX_CMP_REQUEUE except that it does not use val3, and has been shown to be prone to race conditions when used to implement pthread_cond_broadcast(3). Uses the sys_futex3() prototype.

FUTEX_FD

To support asynchronous wakeups, this operation associates a file descriptor with a futex. If another process executes a FUTEX_WAKE, the process will receive the signal number that was passed in val. The calling process must close the returned file descriptor after use.

To prevent race conditions, the caller should test if the futex has been upped after FUTEX_FD returns.

Uses the sys_futex1() prototype.

Return value

Depending on which operation was executed, the returned value has different meanings.

FUTEX_WAIT

Returns 0 if the process was woken by a FUTEX_WAKE call. In case of timeout, -ETIMEDOUT is returned. If the futex was not equal to the expected value, the operation returns -EWOULDBLOCK. Signals (or other spurious wakeups) cause FUTEX_WAIT to return -EINTR.

FUTEX_WAKE and FUTEX_REQUEUE

Returns the number of processes woken up (which equals the number of FUTEX_WAIT operations returning 0).

FUTEX_CMP_REQUEUE

Returns -EAGAIN if the mutex did not have the value specified in val3. Otherwise returns the number of processes woken up. If this is larger than var1, processes have been requeued.

FUTEX_FD

Returns the new file descriptor associated with the futex.

All operations may return -EINVAL in case of unaligned futexes, as well as -EFAULT, -EPERM, -EACCESS when passing pointers to bad or inaccessible memory.

Notes

To reiterate, bare futexes are not intended as an easy to use abstraction for end-users. Implementors are expected to be assembly literate and to have read the sources of the futex userspace library referenced below.

Authors

Futexes were designed and worked on by Hubertus Franke (IBM Thomas J. Watson Research Center), Matthew Kirkwood, Ingo Molnar (Red Hat) and Rusty Russell (IBM Linux Technology Center). This page written by bert hubert.

Versions

Initial futex support was merged in Linux 2.5.7 but with different semantics from those described above. Current semantics are available from Linux 2.5.40 onwards, FUTEX_REQUEUE was added around 2.5.70, whilst FUTEX_CMP_REQUEUE was added in 2.6.7.

See also

futex(4), `Fuss, Futexes and Furwocks: Fast Userlevel Locking in Linux' (proceedings of the Ottawa Linux Symposium 2002), futex example library, futex-*.tar.bz2, `Futexes are tricky' at http://people.redhat.com/drepper/futex.pdf