Patches contributed by Eötvös Lorand University
commit e2970f2fb6950183a34e8545faa093eb49d186e1
Author: Ingo Molnar <mingo@elte.hu>
Date: Tue Jun 27 02:54:47 2006 -0700
[PATCH] pi-futex: futex code cleanups
We are pleased to announce "lightweight userspace priority inheritance" (PI)
support for futexes. The following patchset and glibc patch implements it,
ontop of the robust-futexes patchset which is included in 2.6.16-mm1.
We are calling it lightweight for 3 reasons:
- in the user-space fastpath a PI-enabled futex involves no kernel work
(or any other PI complexity) at all. No registration, no extra kernel
calls - just pure fast atomic ops in userspace.
- in the slowpath (in the lock-contention case), the system call and
scheduling pattern is in fact better than that of normal futexes, due to
the 'integrated' nature of FUTEX_LOCK_PI. [more about that further down]
- the in-kernel PI implementation is streamlined around the mutex
abstraction, with strict rules that keep the implementation relatively
simple: only a single owner may own a lock (i.e. no read-write lock
support), only the owner may unlock a lock, no recursive locking, etc.
Priority Inheritance - why, oh why???
-------------------------------------
Many of you heard the horror stories about the evil PI code circling Linux for
years, which makes no real sense at all and is only used by buggy applications
and which has horrible overhead. Some of you have dreaded this very moment,
when someone actually submits working PI code ;-)
So why would we like to see PI support for futexes?
We'd like to see it done purely for technological reasons. We dont think it's
a buggy concept, we think it's useful functionality to offer to applications,
which functionality cannot be achieved in other ways. We also think it's the
right thing to do, and we think we've got the right arguments and the right
numbers to prove that. We also believe that we can address all the
counter-arguments as well. For these reasons (and the reasons outlined below)
we are submitting this patch-set for upstream kernel inclusion.
What are the benefits of PI?
The short reply:
----------------
User-space PI helps achieving/improving determinism for user-space
applications. In the best-case, it can help achieve determinism and
well-bound latencies. Even in the worst-case, PI will improve the statistical
distribution of locking related application delays.
The longer reply:
-----------------
Firstly, sharing locks between multiple tasks is a common programming
technique that often cannot be replaced with lockless algorithms. As we can
see it in the kernel [which is a quite complex program in itself], lockless
structures are rather the exception than the norm - the current ratio of
lockless vs. locky code for shared data structures is somewhere between 1:10
and 1:100. Lockless is hard, and the complexity of lockless algorithms often
endangers to ability to do robust reviews of said code. I.e. critical RT
apps often choose lock structures to protect critical data structures, instead
of lockless algorithms. Furthermore, there are cases (like shared hardware,
or other resource limits) where lockless access is mathematically impossible.
Media players (such as Jack) are an example of reasonable application design
with multiple tasks (with multiple priority levels) sharing short-held locks:
for example, a highprio audio playback thread is combined with medium-prio
construct-audio-data threads and low-prio display-colory-stuff threads. Add
video and decoding to the mix and we've got even more priority levels.
So once we accept that synchronization objects (locks) are an unavoidable fact
of life, and once we accept that multi-task userspace apps have a very fair
expectation of being able to use locks, we've got to think about how to offer
the option of a deterministic locking implementation to user-space.
Most of the technical counter-arguments against doing priority inheritance
only apply to kernel-space locks. But user-space locks are different, there
we cannot disable interrupts or make the task non-preemptible in a critical
section, so the 'use spinlocks' argument does not apply (user-space spinlocks
have the same priority inversion problems as other user-space locking
constructs). Fact is, pretty much the only technique that currently enables
good determinism for userspace locks (such as futex-based pthread mutexes) is
priority inheritance:
Currently (without PI), if a high-prio and a low-prio task shares a lock [this
is a quite common scenario for most non-trivial RT applications], even if all
critical sections are coded carefully to be deterministic (i.e. all critical
sections are short in duration and only execute a limited number of
instructions), the kernel cannot guarantee any deterministic execution of the
high-prio task: any medium-priority task could preempt the low-prio task while
it holds the shared lock and executes the critical section, and could delay it
indefinitely.
Implementation:
---------------
As mentioned before, the userspace fastpath of PI-enabled pthread mutexes
involves no kernel work at all - they behave quite similarly to normal
futex-based locks: a 0 value means unlocked, and a value==TID means locked.
(This is the same method as used by list-based robust futexes.) Userspace uses
atomic ops to lock/unlock these mutexes without entering the kernel.
To handle the slowpath, we have added two new futex ops:
FUTEX_LOCK_PI
FUTEX_UNLOCK_PI
If the lock-acquire fastpath fails, [i.e. an atomic transition from 0 to TID
fails], then FUTEX_LOCK_PI is called. The kernel does all the remaining work:
if there is no futex-queue attached to the futex address yet then the code
looks up the task that owns the futex [it has put its own TID into the futex
value], and attaches a 'PI state' structure to the futex-queue. The pi_state
includes an rt-mutex, which is a PI-aware, kernel-based synchronization
object. The 'other' task is made the owner of the rt-mutex, and the
FUTEX_WAITERS bit is atomically set in the futex value. Then this task tries
to lock the rt-mutex, on which it blocks. Once it returns, it has the mutex
acquired, and it sets the futex value to its own TID and returns. Userspace
has no other work to perform - it now owns the lock, and futex value contains
FUTEX_WAITERS|TID.
If the unlock side fastpath succeeds, [i.e. userspace manages to do a TID ->
0 atomic transition of the futex value], then no kernel work is triggered.
If the unlock fastpath fails (because the FUTEX_WAITERS bit is set), then
FUTEX_UNLOCK_PI is called, and the kernel unlocks the futex on the behalf of
userspace - and it also unlocks the attached pi_state->rt_mutex and thus wakes
up any potential waiters.
Note that under this approach, contrary to other PI-futex approaches, there is
no prior 'registration' of a PI-futex. [which is not quite possible anyway,
due to existing ABI properties of pthread mutexes.]
Also, under this scheme, 'robustness' and 'PI' are two orthogonal properties
of futexes, and all four combinations are possible: futex, robust-futex,
PI-futex, robust+PI-futex.
glibc support:
--------------
Ulrich Drepper and Jakub Jelinek have written glibc support for PI-futexes
(and robust futexes), enabling robust and PI (PTHREAD_PRIO_INHERIT) POSIX
mutexes. (PTHREAD_PRIO_PROTECT support will be added later on too, no
additional kernel changes are needed for that). [NOTE: The glibc patch is
obviously inofficial and unsupported without matching upstream kernel
functionality.]
the patch-queue and the glibc patch can also be downloaded from:
http://redhat.com/~mingo/PI-futex-patches/
Many thanks go to the people who helped us create this kernel feature: Steven
Rostedt, Esben Nielsen, Benedikt Spranger, Daniel Walker, John Cooper, Arjan
van de Ven, Oleg Nesterov and others. Credits for related prior projects goes
to Dirk Grambow, Inaky Perez-Gonzalez, Bill Huey and many others.
Clean up the futex code, before adding more features to it:
- use u32 as the futex field type - that's the ABI
- use __user and pointers to u32 instead of unsigned long
- code style / comment style cleanups
- rename hash-bucket name from 'bh' to 'hb'.
I checked the pre and post futex.o object files to make sure this
patch has no code effects.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
diff --git a/include/linux/futex.h b/include/linux/futex.h
index 966a5b3da439..f05a3f469322 100644
--- a/include/linux/futex.h
+++ b/include/linux/futex.h
@@ -90,9 +90,8 @@ struct robust_list_head {
*/
#define ROBUST_LIST_LIMIT 2048
-long do_futex(unsigned long uaddr, int op, int val,
- unsigned long timeout, unsigned long uaddr2, int val2,
- int val3);
+long do_futex(u32 __user *uaddr, int op, u32 val, unsigned long timeout,
+ u32 __user *uaddr2, u32 val2, u32 val3);
extern int handle_futex_death(u32 __user *uaddr, struct task_struct *curr);
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 33785b79d548..008f04c56737 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -174,9 +174,9 @@ asmlinkage long sys_waitid(int which, pid_t pid,
int options, struct rusage __user *ru);
asmlinkage long sys_waitpid(pid_t pid, int __user *stat_addr, int options);
asmlinkage long sys_set_tid_address(int __user *tidptr);
-asmlinkage long sys_futex(u32 __user *uaddr, int op, int val,
+asmlinkage long sys_futex(u32 __user *uaddr, int op, u32 val,
struct timespec __user *utime, u32 __user *uaddr2,
- int val3);
+ u32 val3);
asmlinkage long sys_init_module(void __user *umod, unsigned long len,
const char __user *uargs);
diff --git a/kernel/futex.c b/kernel/futex.c
index e1a380c77a5a..50356fb5d726 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -63,7 +63,7 @@ union futex_key {
int offset;
} shared;
struct {
- unsigned long uaddr;
+ unsigned long address;
struct mm_struct *mm;
int offset;
} private;
@@ -87,13 +87,13 @@ struct futex_q {
struct list_head list;
wait_queue_head_t waiters;
- /* Which hash list lock to use. */
+ /* Which hash list lock to use: */
spinlock_t *lock_ptr;
- /* Key which the futex is hashed on. */
+ /* Key which the futex is hashed on: */
union futex_key key;
- /* For fd, sigio sent using these. */
+ /* For fd, sigio sent using these: */
int fd;
struct file *filp;
};
@@ -144,8 +144,9 @@ static inline int match_futex(union futex_key *key1, union futex_key *key2)
*
* Should be called with ¤t->mm->mmap_sem but NOT any spinlocks.
*/
-static int get_futex_key(unsigned long uaddr, union futex_key *key)
+static int get_futex_key(u32 __user *uaddr, union futex_key *key)
{
+ unsigned long address = (unsigned long)uaddr;
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
struct page *page;
@@ -154,16 +155,16 @@ static int get_futex_key(unsigned long uaddr, union futex_key *key)
/*
* The futex address must be "naturally" aligned.
*/
- key->both.offset = uaddr % PAGE_SIZE;
+ key->both.offset = address % PAGE_SIZE;
if (unlikely((key->both.offset % sizeof(u32)) != 0))
return -EINVAL;
- uaddr -= key->both.offset;
+ address -= key->both.offset;
/*
* The futex is hashed differently depending on whether
* it's in a shared or private mapping. So check vma first.
*/
- vma = find_extend_vma(mm, uaddr);
+ vma = find_extend_vma(mm, address);
if (unlikely(!vma))
return -EFAULT;
@@ -184,7 +185,7 @@ static int get_futex_key(unsigned long uaddr, union futex_key *key)
*/
if (likely(!(vma->vm_flags & VM_MAYSHARE))) {
key->private.mm = mm;
- key->private.uaddr = uaddr;
+ key->private.address = address;
return 0;
}
@@ -194,7 +195,7 @@ static int get_futex_key(unsigned long uaddr, union futex_key *key)
key->shared.inode = vma->vm_file->f_dentry->d_inode;
key->both.offset++; /* Bit 0 of offset indicates inode-based key. */
if (likely(!(vma->vm_flags & VM_NONLINEAR))) {
- key->shared.pgoff = (((uaddr - vma->vm_start) >> PAGE_SHIFT)
+ key->shared.pgoff = (((address - vma->vm_start) >> PAGE_SHIFT)
+ vma->vm_pgoff);
return 0;
}
@@ -205,7 +206,7 @@ static int get_futex_key(unsigned long uaddr, union futex_key *key)
* from swap. But that's a lot of code to duplicate here
* for a rare case, so we simply fetch the page.
*/
- err = get_user_pages(current, mm, uaddr, 1, 0, 0, &page, NULL);
+ err = get_user_pages(current, mm, address, 1, 0, 0, &page, NULL);
if (err >= 0) {
key->shared.pgoff =
page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
@@ -246,12 +247,12 @@ static void drop_key_refs(union futex_key *key)
}
}
-static inline int get_futex_value_locked(int *dest, int __user *from)
+static inline int get_futex_value_locked(u32 *dest, u32 __user *from)
{
int ret;
inc_preempt_count();
- ret = __copy_from_user_inatomic(dest, from, sizeof(int));
+ ret = __copy_from_user_inatomic(dest, from, sizeof(u32));
dec_preempt_count();
return ret ? -EFAULT : 0;
@@ -288,12 +289,12 @@ static void wake_futex(struct futex_q *q)
* Wake up all waiters hashed on the physical page that is mapped
* to this virtual address:
*/
-static int futex_wake(unsigned long uaddr, int nr_wake)
+static int futex_wake(u32 __user *uaddr, int nr_wake)
{
- union futex_key key;
- struct futex_hash_bucket *bh;
- struct list_head *head;
+ struct futex_hash_bucket *hb;
struct futex_q *this, *next;
+ struct list_head *head;
+ union futex_key key;
int ret;
down_read(¤t->mm->mmap_sem);
@@ -302,9 +303,9 @@ static int futex_wake(unsigned long uaddr, int nr_wake)
if (unlikely(ret != 0))
goto out;
- bh = hash_futex(&key);
- spin_lock(&bh->lock);
- head = &bh->chain;
+ hb = hash_futex(&key);
+ spin_lock(&hb->lock);
+ head = &hb->chain;
list_for_each_entry_safe(this, next, head, list) {
if (match_futex (&this->key, &key)) {
@@ -314,7 +315,7 @@ static int futex_wake(unsigned long uaddr, int nr_wake)
}
}
- spin_unlock(&bh->lock);
+ spin_unlock(&hb->lock);
out:
up_read(¤t->mm->mmap_sem);
return ret;
@@ -324,10 +325,12 @@ static int futex_wake(unsigned long uaddr, int nr_wake)
* Wake up all waiters hashed on the physical page that is mapped
* to this virtual address:
*/
-static int futex_wake_op(unsigned long uaddr1, unsigned long uaddr2, int nr_wake, int nr_wake2, int op)
+static int
+futex_wake_op(u32 __user *uaddr1, u32 __user *uaddr2,
+ int nr_wake, int nr_wake2, int op)
{
union futex_key key1, key2;
- struct futex_hash_bucket *bh1, *bh2;
+ struct futex_hash_bucket *hb1, *hb2;
struct list_head *head;
struct futex_q *this, *next;
int ret, op_ret, attempt = 0;
@@ -342,27 +345,29 @@ static int futex_wake_op(unsigned long uaddr1, unsigned long uaddr2, int nr_wake
if (unlikely(ret != 0))
goto out;
- bh1 = hash_futex(&key1);
- bh2 = hash_futex(&key2);
+ hb1 = hash_futex(&key1);
+ hb2 = hash_futex(&key2);
retry:
- if (bh1 < bh2)
- spin_lock(&bh1->lock);
- spin_lock(&bh2->lock);
- if (bh1 > bh2)
- spin_lock(&bh1->lock);
+ if (hb1 < hb2)
+ spin_lock(&hb1->lock);
+ spin_lock(&hb2->lock);
+ if (hb1 > hb2)
+ spin_lock(&hb1->lock);
- op_ret = futex_atomic_op_inuser(op, (int __user *)uaddr2);
+ op_ret = futex_atomic_op_inuser(op, uaddr2);
if (unlikely(op_ret < 0)) {
- int dummy;
+ u32 dummy;
- spin_unlock(&bh1->lock);
- if (bh1 != bh2)
- spin_unlock(&bh2->lock);
+ spin_unlock(&hb1->lock);
+ if (hb1 != hb2)
+ spin_unlock(&hb2->lock);
#ifndef CONFIG_MMU
- /* we don't get EFAULT from MMU faults if we don't have an MMU,
- * but we might get them from range checking */
+ /*
+ * we don't get EFAULT from MMU faults if we don't have an MMU,
+ * but we might get them from range checking
+ */
ret = op_ret;
goto out;
#endif
@@ -372,23 +377,26 @@ static int futex_wake_op(unsigned long uaddr1, unsigned long uaddr2, int nr_wake
goto out;
}
- /* futex_atomic_op_inuser needs to both read and write
+ /*
+ * futex_atomic_op_inuser needs to both read and write
* *(int __user *)uaddr2, but we can't modify it
* non-atomically. Therefore, if get_user below is not
* enough, we need to handle the fault ourselves, while
- * still holding the mmap_sem. */
+ * still holding the mmap_sem.
+ */
if (attempt++) {
struct vm_area_struct * vma;
struct mm_struct *mm = current->mm;
+ unsigned long address = (unsigned long)uaddr2;
ret = -EFAULT;
if (attempt >= 2 ||
- !(vma = find_vma(mm, uaddr2)) ||
- vma->vm_start > uaddr2 ||
+ !(vma = find_vma(mm, address)) ||
+ vma->vm_start > address ||
!(vma->vm_flags & VM_WRITE))
goto out;
- switch (handle_mm_fault(mm, vma, uaddr2, 1)) {
+ switch (handle_mm_fault(mm, vma, address, 1)) {
case VM_FAULT_MINOR:
current->min_flt++;
break;
@@ -401,18 +409,20 @@ static int futex_wake_op(unsigned long uaddr1, unsigned long uaddr2, int nr_wake
goto retry;
}
- /* If we would have faulted, release mmap_sem,
- * fault it in and start all over again. */
+ /*
+ * If we would have faulted, release mmap_sem,
+ * fault it in and start all over again.
+ */
up_read(¤t->mm->mmap_sem);
- ret = get_user(dummy, (int __user *)uaddr2);
+ ret = get_user(dummy, uaddr2);
if (ret)
return ret;
goto retryfull;
}
- head = &bh1->chain;
+ head = &hb1->chain;
list_for_each_entry_safe(this, next, head, list) {
if (match_futex (&this->key, &key1)) {
@@ -423,7 +433,7 @@ static int futex_wake_op(unsigned long uaddr1, unsigned long uaddr2, int nr_wake
}
if (op_ret > 0) {
- head = &bh2->chain;
+ head = &hb2->chain;
op_ret = 0;
list_for_each_entry_safe(this, next, head, list) {
@@ -436,9 +446,9 @@ static int futex_wake_op(unsigned long uaddr1, unsigned long uaddr2, int nr_wake
ret += op_ret;
}
- spin_unlock(&bh1->lock);
- if (bh1 != bh2)
- spin_unlock(&bh2->lock);
+ spin_unlock(&hb1->lock);
+ if (hb1 != hb2)
+ spin_unlock(&hb2->lock);
out:
up_read(¤t->mm->mmap_sem);
return ret;
@@ -448,11 +458,11 @@ static int futex_wake_op(unsigned long uaddr1, unsigned long uaddr2, int nr_wake
* Requeue all waiters hashed on one physical page to another
* physical page.
*/
-static int futex_requeue(unsigned long uaddr1, unsigned long uaddr2,
- int nr_wake, int nr_requeue, int *valp)
+static int futex_requeue(u32 __user *uaddr1, u32 __user *uaddr2,
+ int nr_wake, int nr_requeue, u32 *cmpval)
{
union futex_key key1, key2;
- struct futex_hash_bucket *bh1, *bh2;
+ struct futex_hash_bucket *hb1, *hb2;
struct list_head *head1;
struct futex_q *this, *next;
int ret, drop_count = 0;
@@ -467,68 +477,69 @@ static int futex_requeue(unsigned long uaddr1, unsigned long uaddr2,
if (unlikely(ret != 0))
goto out;
- bh1 = hash_futex(&key1);
- bh2 = hash_futex(&key2);
+ hb1 = hash_futex(&key1);
+ hb2 = hash_futex(&key2);
- if (bh1 < bh2)
- spin_lock(&bh1->lock);
- spin_lock(&bh2->lock);
- if (bh1 > bh2)
- spin_lock(&bh1->lock);
+ if (hb1 < hb2)
+ spin_lock(&hb1->lock);
+ spin_lock(&hb2->lock);
+ if (hb1 > hb2)
+ spin_lock(&hb1->lock);
- if (likely(valp != NULL)) {
- int curval;
+ if (likely(cmpval != NULL)) {
+ u32 curval;
- ret = get_futex_value_locked(&curval, (int __user *)uaddr1);
+ ret = get_futex_value_locked(&curval, uaddr1);
if (unlikely(ret)) {
- spin_unlock(&bh1->lock);
- if (bh1 != bh2)
- spin_unlock(&bh2->lock);
+ spin_unlock(&hb1->lock);
+ if (hb1 != hb2)
+ spin_unlock(&hb2->lock);
- /* If we would have faulted, release mmap_sem, fault
+ /*
+ * If we would have faulted, release mmap_sem, fault
* it in and start all over again.
*/
up_read(¤t->mm->mmap_sem);
- ret = get_user(curval, (int __user *)uaddr1);
+ ret = get_user(curval, uaddr1);
if (!ret)
goto retry;
return ret;
}
- if (curval != *valp) {
+ if (curval != *cmpval) {
ret = -EAGAIN;
goto out_unlock;
}
}
- head1 = &bh1->chain;
+ head1 = &hb1->chain;
list_for_each_entry_safe(this, next, head1, list) {
if (!match_futex (&this->key, &key1))
continue;
if (++ret <= nr_wake) {
wake_futex(this);
} else {
- list_move_tail(&this->list, &bh2->chain);
- this->lock_ptr = &bh2->lock;
+ list_move_tail(&this->list, &hb2->chain);
+ this->lock_ptr = &hb2->lock;
this->key = key2;
get_key_refs(&key2);
drop_count++;
if (ret - nr_wake >= nr_requeue)
break;
- /* Make sure to stop if key1 == key2 */
- if (head1 == &bh2->chain && head1 != &next->list)
+ /* Make sure to stop if key1 == key2: */
+ if (head1 == &hb2->chain && head1 != &next->list)
head1 = &this->list;
}
}
out_unlock:
- spin_unlock(&bh1->lock);
- if (bh1 != bh2)
- spin_unlock(&bh2->lock);
+ spin_unlock(&hb1->lock);
+ if (hb1 != hb2)
+ spin_unlock(&hb2->lock);
/* drop_key_refs() must be called outside the spinlocks. */
while (--drop_count >= 0)
@@ -543,7 +554,7 @@ static int futex_requeue(unsigned long uaddr1, unsigned long uaddr2,
static inline struct futex_hash_bucket *
queue_lock(struct futex_q *q, int fd, struct file *filp)
{
- struct futex_hash_bucket *bh;
+ struct futex_hash_bucket *hb;
q->fd = fd;
q->filp = filp;
@@ -551,23 +562,23 @@ queue_lock(struct futex_q *q, int fd, struct file *filp)
init_waitqueue_head(&q->waiters);
get_key_refs(&q->key);
- bh = hash_futex(&q->key);
- q->lock_ptr = &bh->lock;
+ hb = hash_futex(&q->key);
+ q->lock_ptr = &hb->lock;
- spin_lock(&bh->lock);
- return bh;
+ spin_lock(&hb->lock);
+ return hb;
}
-static inline void __queue_me(struct futex_q *q, struct futex_hash_bucket *bh)
+static inline void __queue_me(struct futex_q *q, struct futex_hash_bucket *hb)
{
- list_add_tail(&q->list, &bh->chain);
- spin_unlock(&bh->lock);
+ list_add_tail(&q->list, &hb->chain);
+ spin_unlock(&hb->lock);
}
static inline void
-queue_unlock(struct futex_q *q, struct futex_hash_bucket *bh)
+queue_unlock(struct futex_q *q, struct futex_hash_bucket *hb)
{
- spin_unlock(&bh->lock);
+ spin_unlock(&hb->lock);
drop_key_refs(&q->key);
}
@@ -579,16 +590,17 @@ queue_unlock(struct futex_q *q, struct futex_hash_bucket *bh)
/* The key must be already stored in q->key. */
static void queue_me(struct futex_q *q, int fd, struct file *filp)
{
- struct futex_hash_bucket *bh;
- bh = queue_lock(q, fd, filp);
- __queue_me(q, bh);
+ struct futex_hash_bucket *hb;
+
+ hb = queue_lock(q, fd, filp);
+ __queue_me(q, hb);
}
/* Return 1 if we were still queued (ie. 0 means we were woken) */
static int unqueue_me(struct futex_q *q)
{
- int ret = 0;
spinlock_t *lock_ptr;
+ int ret = 0;
/* In the common case we don't take the spinlock, which is nice. */
retry:
@@ -622,12 +634,13 @@ static int unqueue_me(struct futex_q *q)
return ret;
}
-static int futex_wait(unsigned long uaddr, int val, unsigned long time)
+static int futex_wait(u32 __user *uaddr, u32 val, unsigned long time)
{
DECLARE_WAITQUEUE(wait, current);
- int ret, curval;
+ struct futex_hash_bucket *hb;
struct futex_q q;
- struct futex_hash_bucket *bh;
+ u32 uval;
+ int ret;
retry:
down_read(¤t->mm->mmap_sem);
@@ -636,7 +649,7 @@ static int futex_wait(unsigned long uaddr, int val, unsigned long time)
if (unlikely(ret != 0))
goto out_release_sem;
- bh = queue_lock(&q, -1, NULL);
+ hb = queue_lock(&q, -1, NULL);
/*
* Access the page AFTER the futex is queued.
@@ -658,31 +671,31 @@ static int futex_wait(unsigned long uaddr, int val, unsigned long time)
* We hold the mmap semaphore, so the mapping cannot have changed
* since we looked it up in get_futex_key.
*/
-
- ret = get_futex_value_locked(&curval, (int __user *)uaddr);
+ ret = get_futex_value_locked(&uval, uaddr);
if (unlikely(ret)) {
- queue_unlock(&q, bh);
+ queue_unlock(&q, hb);
- /* If we would have faulted, release mmap_sem, fault it in and
+ /*
+ * If we would have faulted, release mmap_sem, fault it in and
* start all over again.
*/
up_read(¤t->mm->mmap_sem);
- ret = get_user(curval, (int __user *)uaddr);
+ ret = get_user(uval, uaddr);
if (!ret)
goto retry;
return ret;
}
- if (curval != val) {
+ if (uval != val) {
ret = -EWOULDBLOCK;
- queue_unlock(&q, bh);
+ queue_unlock(&q, hb);
goto out_release_sem;
}
/* Only actually queue if *uaddr contained val. */
- __queue_me(&q, bh);
+ __queue_me(&q, hb);
/*
* Now the futex is queued and we have checked the data, we
@@ -720,8 +733,10 @@ static int futex_wait(unsigned long uaddr, int val, unsigned long time)
return 0;
if (time == 0)
return -ETIMEDOUT;
- /* We expect signal_pending(current), but another thread may
- * have handled it for us already. */
+ /*
+ * We expect signal_pending(current), but another thread may
+ * have handled it for us already.
+ */
return -EINTR;
out_release_sem:
@@ -735,6 +750,7 @@ static int futex_close(struct inode *inode, struct file *filp)
unqueue_me(q);
kfree(q);
+
return 0;
}
@@ -766,7 +782,7 @@ static struct file_operations futex_fops = {
* Signal allows caller to avoid the race which would occur if they
* set the sigio stuff up afterwards.
*/
-static int futex_fd(unsigned long uaddr, int signal)
+static int futex_fd(u32 __user *uaddr, int signal)
{
struct futex_q *q;
struct file *filp;
@@ -937,7 +953,7 @@ int handle_futex_death(u32 __user *uaddr, struct task_struct *curr)
goto retry;
if (uval & FUTEX_WAITERS)
- futex_wake((unsigned long)uaddr, 1);
+ futex_wake(uaddr, 1);
}
return 0;
}
@@ -999,8 +1015,8 @@ void exit_robust_list(struct task_struct *curr)
}
}
-long do_futex(unsigned long uaddr, int op, int val, unsigned long timeout,
- unsigned long uaddr2, int val2, int val3)
+long do_futex(u32 __user *uaddr, int op, u32 val, unsigned long timeout,
+ u32 __user *uaddr2, u32 val2, u32 val3)
{
int ret;
@@ -1031,13 +1047,13 @@ long do_futex(unsigned long uaddr, int op, int val, unsigned long timeout,
}
-asmlinkage long sys_futex(u32 __user *uaddr, int op, int val,
+asmlinkage long sys_futex(u32 __user *uaddr, int op, u32 val,
struct timespec __user *utime, u32 __user *uaddr2,
- int val3)
+ u32 val3)
{
struct timespec t;
unsigned long timeout = MAX_SCHEDULE_TIMEOUT;
- int val2 = 0;
+ u32 val2 = 0;
if (utime && (op == FUTEX_WAIT)) {
if (copy_from_user(&t, utime, sizeof(t)) != 0)
@@ -1050,10 +1066,9 @@ asmlinkage long sys_futex(u32 __user *uaddr, int op, int val,
* requeue parameter in 'utime' if op == FUTEX_REQUEUE.
*/
if (op >= FUTEX_REQUEUE)
- val2 = (int) (unsigned long) utime;
+ val2 = (u32) (unsigned long) utime;
- return do_futex((unsigned long)uaddr, op, val, timeout,
- (unsigned long)uaddr2, val2, val3);
+ return do_futex(uaddr, op, val, timeout, uaddr2, val2, val3);
}
static int futexfs_get_sb(struct file_system_type *fs_type,
diff --git a/kernel/futex_compat.c b/kernel/futex_compat.c
index 1ab6a0ea3d14..7e57c31670a3 100644
--- a/kernel/futex_compat.c
+++ b/kernel/futex_compat.c
@@ -139,6 +139,5 @@ asmlinkage long compat_sys_futex(u32 __user *uaddr, int op, u32 val,
if (op >= FUTEX_REQUEUE)
val2 = (int) (unsigned long) utime;
- return do_futex((unsigned long)uaddr, op, val, timeout,
- (unsigned long)uaddr2, val2, val3);
+ return do_futex(uaddr, op, val, timeout, uaddr2, val2, val3);
}
commit 34af946a22724c4e2b204957f2b24b22a0fb121c
Author: Ingo Molnar <mingo@elte.hu>
Date: Tue Jun 27 02:53:55 2006 -0700
[PATCH] spin/rwlock init cleanups
locking init cleanups:
- convert " = SPIN_LOCK_UNLOCKED" to spin_lock_init() or DEFINE_SPINLOCK()
- convert rwlocks in a similar manner
this patch was generated automatically.
Motivation:
- cleanliness
- lockdep needs control of lock initialization, which the open-coded
variants do not give
- it's also useful for -rt and for lock debugging in general
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
diff --git a/arch/ia64/sn/kernel/irq.c b/arch/ia64/sn/kernel/irq.c
index dc8e2b696713..677c6c0fd661 100644
--- a/arch/ia64/sn/kernel/irq.c
+++ b/arch/ia64/sn/kernel/irq.c
@@ -27,7 +27,7 @@ static void unregister_intr_pda(struct sn_irq_info *sn_irq_info);
int sn_force_interrupt_flag = 1;
extern int sn_ioif_inited;
struct list_head **sn_irq_lh;
-static spinlock_t sn_irq_info_lock = SPIN_LOCK_UNLOCKED; /* non-IRQ lock */
+static DEFINE_SPINLOCK(sn_irq_info_lock); /* non-IRQ lock */
u64 sn_intr_alloc(nasid_t local_nasid, int local_widget,
struct sn_irq_info *sn_irq_info,
diff --git a/arch/mips/kernel/smtc.c b/arch/mips/kernel/smtc.c
index 2e8e52c135e6..70cf09afdf56 100644
--- a/arch/mips/kernel/smtc.c
+++ b/arch/mips/kernel/smtc.c
@@ -367,7 +367,7 @@ void mipsmt_prepare_cpus(void)
dvpe();
dmt();
- freeIPIq.lock = SPIN_LOCK_UNLOCKED;
+ spin_lock_init(&freeIPIq.lock);
/*
* We probably don't have as many VPEs as we do SMP "CPUs",
@@ -375,7 +375,7 @@ void mipsmt_prepare_cpus(void)
*/
for (i=0; i<NR_CPUS; i++) {
IPIQ[i].head = IPIQ[i].tail = NULL;
- IPIQ[i].lock = SPIN_LOCK_UNLOCKED;
+ spin_lock_init(&IPIQ[i].lock);
IPIQ[i].depth = 0;
ipi_timer_latch[i] = 0;
}
diff --git a/arch/powerpc/platforms/cell/spufs/switch.c b/arch/powerpc/platforms/cell/spufs/switch.c
index 3068b429b031..a656d810a44a 100644
--- a/arch/powerpc/platforms/cell/spufs/switch.c
+++ b/arch/powerpc/platforms/cell/spufs/switch.c
@@ -2203,7 +2203,7 @@ void spu_init_csa(struct spu_state *csa)
memset(lscsa, 0, sizeof(struct spu_lscsa));
csa->lscsa = lscsa;
- csa->register_lock = SPIN_LOCK_UNLOCKED;
+ spin_lock_init(&csa->register_lock);
/* Set LS pages reserved to allow for user-space mapping. */
for (p = lscsa->ls; p < lscsa->ls + LS_SIZE; p += PAGE_SIZE)
diff --git a/arch/powerpc/platforms/powermac/pfunc_core.c b/arch/powerpc/platforms/powermac/pfunc_core.c
index 047f954a89eb..93e7505debc5 100644
--- a/arch/powerpc/platforms/powermac/pfunc_core.c
+++ b/arch/powerpc/platforms/powermac/pfunc_core.c
@@ -546,7 +546,7 @@ struct pmf_device {
};
static LIST_HEAD(pmf_devices);
-static spinlock_t pmf_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(pmf_lock);
static DEFINE_MUTEX(pmf_irq_mutex);
static void pmf_release_device(struct kref *kref)
diff --git a/arch/powerpc/platforms/pseries/eeh_event.c b/arch/powerpc/platforms/pseries/eeh_event.c
index 8f2d12935b99..45ccc687e57c 100644
--- a/arch/powerpc/platforms/pseries/eeh_event.c
+++ b/arch/powerpc/platforms/pseries/eeh_event.c
@@ -35,7 +35,7 @@
*/
/* EEH event workqueue setup. */
-static spinlock_t eeh_eventlist_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(eeh_eventlist_lock);
LIST_HEAD(eeh_eventlist);
static void eeh_thread_launcher(void *);
DECLARE_WORK(eeh_event_wq, eeh_thread_launcher, NULL);
diff --git a/arch/powerpc/sysdev/mmio_nvram.c b/arch/powerpc/sysdev/mmio_nvram.c
index 74e0d31a3559..615350d46b52 100644
--- a/arch/powerpc/sysdev/mmio_nvram.c
+++ b/arch/powerpc/sysdev/mmio_nvram.c
@@ -32,7 +32,7 @@
static void __iomem *mmio_nvram_start;
static long mmio_nvram_len;
-static spinlock_t mmio_nvram_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(mmio_nvram_lock);
static ssize_t mmio_nvram_read(char *buf, size_t count, loff_t *index)
{
diff --git a/arch/xtensa/kernel/time.c b/arch/xtensa/kernel/time.c
index 937d81f62f43..fe14909f45e0 100644
--- a/arch/xtensa/kernel/time.c
+++ b/arch/xtensa/kernel/time.c
@@ -29,7 +29,7 @@
extern volatile unsigned long wall_jiffies;
-spinlock_t rtc_lock = SPIN_LOCK_UNLOCKED;
+DEFINE_SPINLOCK(rtc_lock);
EXPORT_SYMBOL(rtc_lock);
diff --git a/arch/xtensa/kernel/traps.c b/arch/xtensa/kernel/traps.c
index 225d64d73f04..27e409089a7b 100644
--- a/arch/xtensa/kernel/traps.c
+++ b/arch/xtensa/kernel/traps.c
@@ -461,7 +461,7 @@ void show_code(unsigned int *pc)
}
}
-spinlock_t die_lock = SPIN_LOCK_UNLOCKED;
+DEFINE_SPINLOCK(die_lock);
void die(const char * str, struct pt_regs * regs, long err)
{
diff --git a/drivers/char/drm/drm_memory_debug.h b/drivers/char/drm/drm_memory_debug.h
index 6543b9a14c42..d117cc997192 100644
--- a/drivers/char/drm/drm_memory_debug.h
+++ b/drivers/char/drm/drm_memory_debug.h
@@ -43,7 +43,7 @@ typedef struct drm_mem_stats {
unsigned long bytes_freed;
} drm_mem_stats_t;
-static spinlock_t drm_mem_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(drm_mem_lock);
static unsigned long drm_ram_available = 0; /* In pages */
static unsigned long drm_ram_used = 0;
static drm_mem_stats_t drm_mem_stats[] =
diff --git a/drivers/char/drm/via_dmablit.c b/drivers/char/drm/via_dmablit.c
index b7f17457b424..78a81a4a99c5 100644
--- a/drivers/char/drm/via_dmablit.c
+++ b/drivers/char/drm/via_dmablit.c
@@ -557,7 +557,7 @@ via_init_dmablit(drm_device_t *dev)
blitq->num_outstanding = 0;
blitq->is_active = 0;
blitq->aborting = 0;
- blitq->blit_lock = SPIN_LOCK_UNLOCKED;
+ spin_lock_init(&blitq->blit_lock);
for (j=0; j<VIA_NUM_BLIT_SLOTS; ++j) {
DRM_INIT_WAITQUEUE(blitq->blit_queue + j);
}
diff --git a/drivers/char/epca.c b/drivers/char/epca.c
index 9cad8501d62c..dc0602ae8503 100644
--- a/drivers/char/epca.c
+++ b/drivers/char/epca.c
@@ -80,7 +80,7 @@ static int invalid_lilo_config;
/* The ISA boards do window flipping into the same spaces so its only sane
with a single lock. It's still pretty efficient */
-static spinlock_t epca_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(epca_lock);
/* -----------------------------------------------------------------------
MAXBOARDS is typically 12, but ISA and EISA cards are restricted to
diff --git a/drivers/char/moxa.c b/drivers/char/moxa.c
index f43c2e04eadd..01247cccb89f 100644
--- a/drivers/char/moxa.c
+++ b/drivers/char/moxa.c
@@ -301,7 +301,7 @@ static struct tty_operations moxa_ops = {
.tiocmset = moxa_tiocmset,
};
-static spinlock_t moxa_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(moxa_lock);
#ifdef CONFIG_PCI
static int moxa_get_PCI_conf(struct pci_dev *p, int board_type, moxa_board_conf * board)
diff --git a/drivers/char/specialix.c b/drivers/char/specialix.c
index 1b5330299e30..d2d6b01dcd05 100644
--- a/drivers/char/specialix.c
+++ b/drivers/char/specialix.c
@@ -2477,7 +2477,7 @@ static int __init specialix_init(void)
#endif
for (i = 0; i < SX_NBOARD; i++)
- sx_board[i].lock = SPIN_LOCK_UNLOCKED;
+ spin_lock_init(&sx_board[i].lock);
if (sx_init_drivers()) {
func_exit();
diff --git a/drivers/char/sx.c b/drivers/char/sx.c
index 3b4747230270..76b9107f7f81 100644
--- a/drivers/char/sx.c
+++ b/drivers/char/sx.c
@@ -2320,7 +2320,7 @@ static int sx_init_portstructs (int nboards, int nports)
#ifdef NEW_WRITE_LOCKING
port->gs.port_write_mutex = MUTEX;
#endif
- port->gs.driver_lock = SPIN_LOCK_UNLOCKED;
+ spin_lock_init(&port->gs.driver_lock);
/*
* Initializing wait queue
*/
diff --git a/drivers/isdn/gigaset/common.c b/drivers/isdn/gigaset/common.c
index acb7e2656780..2a56bf33a673 100644
--- a/drivers/isdn/gigaset/common.c
+++ b/drivers/isdn/gigaset/common.c
@@ -981,7 +981,7 @@ void gigaset_stop(struct cardstate *cs)
EXPORT_SYMBOL_GPL(gigaset_stop);
static LIST_HEAD(drivers);
-static spinlock_t driver_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(driver_lock);
struct cardstate *gigaset_get_cs_by_id(int id)
{
diff --git a/drivers/leds/led-core.c b/drivers/leds/led-core.c
index fe6541326c71..9b015f9af351 100644
--- a/drivers/leds/led-core.c
+++ b/drivers/leds/led-core.c
@@ -18,7 +18,7 @@
#include <linux/leds.h>
#include "leds.h"
-rwlock_t leds_list_lock = RW_LOCK_UNLOCKED;
+DEFINE_RWLOCK(leds_list_lock);
LIST_HEAD(leds_list);
EXPORT_SYMBOL_GPL(leds_list);
diff --git a/drivers/leds/led-triggers.c b/drivers/leds/led-triggers.c
index 5e2cd8be1191..1b1ce6523960 100644
--- a/drivers/leds/led-triggers.c
+++ b/drivers/leds/led-triggers.c
@@ -26,7 +26,7 @@
/*
* Nests outside led_cdev->trigger_lock
*/
-static rwlock_t triggers_list_lock = RW_LOCK_UNLOCKED;
+static DEFINE_RWLOCK(triggers_list_lock);
static LIST_HEAD(trigger_list);
ssize_t led_trigger_store(struct class_device *dev, const char *buf,
diff --git a/drivers/misc/ibmasm/module.c b/drivers/misc/ibmasm/module.c
index 1fdf03fd2da7..9706cc19134a 100644
--- a/drivers/misc/ibmasm/module.c
+++ b/drivers/misc/ibmasm/module.c
@@ -85,7 +85,7 @@ static int __devinit ibmasm_init_one(struct pci_dev *pdev, const struct pci_devi
}
memset(sp, 0, sizeof(struct service_processor));
- sp->lock = SPIN_LOCK_UNLOCKED;
+ spin_lock_init(&sp->lock);
INIT_LIST_HEAD(&sp->command_queue);
pci_set_drvdata(pdev, (void *)sp);
diff --git a/drivers/pcmcia/m8xx_pcmcia.c b/drivers/pcmcia/m8xx_pcmcia.c
index 0e07d9535116..d0f68ab8f041 100644
--- a/drivers/pcmcia/m8xx_pcmcia.c
+++ b/drivers/pcmcia/m8xx_pcmcia.c
@@ -157,7 +157,7 @@ MODULE_LICENSE("Dual MPL/GPL");
static int pcmcia_schlvl = PCMCIA_SCHLVL;
-static spinlock_t events_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(events_lock);
#define PCMCIA_SOCKET_KEY_5V 1
@@ -644,7 +644,7 @@ static struct platform_device m8xx_device = {
};
static u32 pending_events[PCMCIA_SOCKETS_NO];
-static spinlock_t pending_event_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(pending_event_lock);
static irqreturn_t m8xx_interrupt(int irq, void *dev, struct pt_regs *regs)
{
diff --git a/drivers/rapidio/rio-access.c b/drivers/rapidio/rio-access.c
index b9fab2ae3a36..8b56bbdd011e 100644
--- a/drivers/rapidio/rio-access.c
+++ b/drivers/rapidio/rio-access.c
@@ -17,8 +17,8 @@
* These interrupt-safe spinlocks protect all accesses to RIO
* configuration space and doorbell access.
*/
-static spinlock_t rio_config_lock = SPIN_LOCK_UNLOCKED;
-static spinlock_t rio_doorbell_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(rio_config_lock);
+static DEFINE_SPINLOCK(rio_doorbell_lock);
/*
* Wrappers for all RIO configuration access functions. They just check
diff --git a/drivers/rtc/rtc-sa1100.c b/drivers/rtc/rtc-sa1100.c
index ab486fbc828d..9cd1cb304bb2 100644
--- a/drivers/rtc/rtc-sa1100.c
+++ b/drivers/rtc/rtc-sa1100.c
@@ -45,7 +45,7 @@
static unsigned long rtc_freq = 1024;
static struct rtc_time rtc_alarm;
-static spinlock_t sa1100_rtc_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(sa1100_rtc_lock);
static int rtc_update_alarm(struct rtc_time *alrm)
{
diff --git a/drivers/rtc/rtc-vr41xx.c b/drivers/rtc/rtc-vr41xx.c
index 33e029207e26..4b9291dd4443 100644
--- a/drivers/rtc/rtc-vr41xx.c
+++ b/drivers/rtc/rtc-vr41xx.c
@@ -93,7 +93,7 @@ static void __iomem *rtc2_base;
static unsigned long epoch = 1970; /* Jan 1 1970 00:00:00 */
-static spinlock_t rtc_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(rtc_lock);
static char rtc_name[] = "RTC";
static unsigned long periodic_frequency;
static unsigned long periodic_count;
diff --git a/drivers/s390/block/dasd_eer.c b/drivers/s390/block/dasd_eer.c
index 2d946b6ca074..2d8af709947f 100644
--- a/drivers/s390/block/dasd_eer.c
+++ b/drivers/s390/block/dasd_eer.c
@@ -89,7 +89,7 @@ struct eerbuffer {
};
static LIST_HEAD(bufferlist);
-static spinlock_t bufferlock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(bufferlock);
static DECLARE_WAIT_QUEUE_HEAD(dasd_eer_read_wait_queue);
/*
diff --git a/drivers/scsi/libata-core.c b/drivers/scsi/libata-core.c
index 6c66877be2bf..855ce9a9d948 100644
--- a/drivers/scsi/libata-core.c
+++ b/drivers/scsi/libata-core.c
@@ -5733,7 +5733,7 @@ module_init(ata_init);
module_exit(ata_exit);
static unsigned long ratelimit_time;
-static spinlock_t ata_ratelimit_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(ata_ratelimit_lock);
int ata_ratelimit(void)
{
diff --git a/drivers/sn/ioc3.c b/drivers/sn/ioc3.c
index 501316b198e5..ed946311d3a4 100644
--- a/drivers/sn/ioc3.c
+++ b/drivers/sn/ioc3.c
@@ -26,7 +26,7 @@ static DECLARE_RWSEM(ioc3_devices_rwsem);
static struct ioc3_submodule *ioc3_submodules[IOC3_MAX_SUBMODULES];
static struct ioc3_submodule *ioc3_ethernet;
-static rwlock_t ioc3_submodules_lock = RW_LOCK_UNLOCKED;
+static DEFINE_RWLOCK(ioc3_submodules_lock);
/* NIC probing code */
diff --git a/drivers/video/backlight/hp680_bl.c b/drivers/video/backlight/hp680_bl.c
index a71e984c93d4..ffc72ae3ada8 100644
--- a/drivers/video/backlight/hp680_bl.c
+++ b/drivers/video/backlight/hp680_bl.c
@@ -27,7 +27,7 @@
static int hp680bl_suspended;
static int current_intensity = 0;
-static spinlock_t bl_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(bl_lock);
static struct backlight_device *hp680_backlight_device;
static void hp680bl_send_intensity(struct backlight_device *bd)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 1630b5670dc2..7c7d01672d35 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -123,7 +123,7 @@ static void release_stateid(struct nfs4_stateid *stp, int flags);
*/
/* recall_lock protects the del_recall_lru */
-static spinlock_t recall_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(recall_lock);
static struct list_head del_recall_lru;
static void
diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
index 21f38accd039..1d26cfcd9f84 100644
--- a/fs/ocfs2/cluster/heartbeat.c
+++ b/fs/ocfs2/cluster/heartbeat.c
@@ -54,7 +54,7 @@ static DECLARE_RWSEM(o2hb_callback_sem);
* multiple hb threads are watching multiple regions. A node is live
* whenever any of the threads sees activity from the node in its region.
*/
-static spinlock_t o2hb_live_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(o2hb_live_lock);
static struct list_head o2hb_live_slots[O2NM_MAX_NODES];
static unsigned long o2hb_live_node_bitmap[BITS_TO_LONGS(O2NM_MAX_NODES)];
static LIST_HEAD(o2hb_node_events);
diff --git a/fs/ocfs2/cluster/tcp.c b/fs/ocfs2/cluster/tcp.c
index 0f60cc0d3985..1591eb37a723 100644
--- a/fs/ocfs2/cluster/tcp.c
+++ b/fs/ocfs2/cluster/tcp.c
@@ -108,7 +108,7 @@
##args); \
} while (0)
-static rwlock_t o2net_handler_lock = RW_LOCK_UNLOCKED;
+static DEFINE_RWLOCK(o2net_handler_lock);
static struct rb_root o2net_handler_tree = RB_ROOT;
static struct o2net_node o2net_nodes[O2NM_MAX_NODES];
diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c
index ba27c5c5e959..b8c23f7ba67e 100644
--- a/fs/ocfs2/dlm/dlmdomain.c
+++ b/fs/ocfs2/dlm/dlmdomain.c
@@ -88,7 +88,7 @@ static void **dlm_alloc_pagevec(int pages)
*
*/
-spinlock_t dlm_domain_lock = SPIN_LOCK_UNLOCKED;
+DEFINE_SPINLOCK(dlm_domain_lock);
LIST_HEAD(dlm_domains);
static DECLARE_WAIT_QUEUE_HEAD(dlm_domain_events);
diff --git a/fs/ocfs2/dlm/dlmlock.c b/fs/ocfs2/dlm/dlmlock.c
index d6f89577e25f..5ca57ec650c7 100644
--- a/fs/ocfs2/dlm/dlmlock.c
+++ b/fs/ocfs2/dlm/dlmlock.c
@@ -53,7 +53,7 @@
#define MLOG_MASK_PREFIX ML_DLM
#include "cluster/masklog.h"
-static spinlock_t dlm_cookie_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(dlm_cookie_lock);
static u64 dlm_next_cookie = 1;
static enum dlm_status dlm_send_remote_lock_request(struct dlm_ctxt *dlm,
diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
index da399013516f..29b2845f370d 100644
--- a/fs/ocfs2/dlm/dlmrecovery.c
+++ b/fs/ocfs2/dlm/dlmrecovery.c
@@ -98,8 +98,8 @@ static void dlm_mig_lockres_worker(struct dlm_work_item *item, void *data);
static u64 dlm_get_next_mig_cookie(void);
-static spinlock_t dlm_reco_state_lock = SPIN_LOCK_UNLOCKED;
-static spinlock_t dlm_mig_cookie_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(dlm_reco_state_lock);
+static DEFINE_SPINLOCK(dlm_mig_cookie_lock);
static u64 dlm_mig_cookie = 1;
static u64 dlm_get_next_mig_cookie(void)
diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 64cd52860c87..4acd37286bdd 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -242,7 +242,7 @@ static void ocfs2_build_lock_name(enum ocfs2_lock_type type,
mlog_exit_void();
}
-static spinlock_t ocfs2_dlm_tracking_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(ocfs2_dlm_tracking_lock);
static void ocfs2_add_lockres_tracking(struct ocfs2_lock_res *res,
struct ocfs2_dlm_debug *dlm_debug)
diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
index 3fe8781c22cb..910a601b2e98 100644
--- a/fs/ocfs2/journal.c
+++ b/fs/ocfs2/journal.c
@@ -49,7 +49,7 @@
#include "buffer_head_io.h"
-spinlock_t trans_inc_lock = SPIN_LOCK_UNLOCKED;
+DEFINE_SPINLOCK(trans_inc_lock);
static int ocfs2_force_read_journal(struct inode *inode);
static int ocfs2_recover_node(struct ocfs2_super *osb,
diff --git a/include/asm-alpha/core_t2.h b/include/asm-alpha/core_t2.h
index dba70c62a16c..457c34b6eb09 100644
--- a/include/asm-alpha/core_t2.h
+++ b/include/asm-alpha/core_t2.h
@@ -435,7 +435,7 @@ static inline void t2_outl(u32 b, unsigned long addr)
set_hae(msb); \
}
-static spinlock_t t2_hae_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(t2_hae_lock);
__EXTERN_INLINE u8 t2_readb(const volatile void __iomem *xaddr)
{
diff --git a/kernel/audit.c b/kernel/audit.c
index 7dfac7031bd7..82443fb433ef 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -818,7 +818,7 @@ static struct audit_buffer * audit_buffer_alloc(struct audit_context *ctx,
*/
unsigned int audit_serial(void)
{
- static spinlock_t serial_lock = SPIN_LOCK_UNLOCKED;
+ static DEFINE_SPINLOCK(serial_lock);
static unsigned int serial = 0;
unsigned long flags;
diff --git a/mm/sparse.c b/mm/sparse.c
index e0a3fe48aa37..c7a2b3a0e46b 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -45,7 +45,7 @@ static struct mem_section *sparse_index_alloc(int nid)
static int sparse_index_init(unsigned long section_nr, int nid)
{
- static spinlock_t index_init_lock = SPIN_LOCK_UNLOCKED;
+ static DEFINE_SPINLOCK(index_init_lock);
unsigned long root = SECTION_NR_TO_ROOT(section_nr);
struct mem_section *section;
int ret = 0;
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 8a777932786d..e728980160d2 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -349,7 +349,7 @@ static struct rt6_info *rt6_select(struct rt6_info **head, int oif,
(strict & RT6_SELECT_F_REACHABLE) &&
last && last != rt0) {
/* no entries matched; do round-robin */
- static spinlock_t lock = SPIN_LOCK_UNLOCKED;
+ static DEFINE_SPINLOCK(lock);
spin_lock(&lock);
*head = rt0->u.next;
rt0->u.next = last->u.next;
diff --git a/net/sunrpc/auth_gss/gss_krb5_seal.c b/net/sunrpc/auth_gss/gss_krb5_seal.c
index f43311221a72..2f312164d6d5 100644
--- a/net/sunrpc/auth_gss/gss_krb5_seal.c
+++ b/net/sunrpc/auth_gss/gss_krb5_seal.c
@@ -70,7 +70,7 @@
# define RPCDBG_FACILITY RPCDBG_AUTH
#endif
-spinlock_t krb5_seq_lock = SPIN_LOCK_UNLOCKED;
+DEFINE_SPINLOCK(krb5_seq_lock);
u32
gss_get_mic_kerberos(struct gss_ctx *gss_ctx, struct xdr_buf *text,
diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
index 54128040a124..1bb75703f384 100644
--- a/net/tipc/bcast.c
+++ b/net/tipc/bcast.c
@@ -117,7 +117,7 @@ struct bclink {
static struct bcbearer *bcbearer = NULL;
static struct bclink *bclink = NULL;
static struct link *bcl = NULL;
-static spinlock_t bc_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(bc_lock);
char tipc_bclink_name[] = "multicast-link";
@@ -796,7 +796,7 @@ int tipc_bclink_init(void)
memset(bclink, 0, sizeof(struct bclink));
INIT_LIST_HEAD(&bcl->waiting_ports);
bcl->next_out_no = 1;
- bclink->node.lock = SPIN_LOCK_UNLOCKED;
+ spin_lock_init(&bclink->node.lock);
bcl->owner = &bclink->node;
bcl->max_pkt = MAX_PKT_DEFAULT_MCAST;
tipc_link_set_queue_limits(bcl, BCLINK_WIN_DEFAULT);
diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c
index 4fa24b5e8914..7ef17a449cfd 100644
--- a/net/tipc/bearer.c
+++ b/net/tipc/bearer.c
@@ -566,7 +566,7 @@ int tipc_enable_bearer(const char *name, u32 bcast_scope, u32 priority)
b_ptr->link_req = tipc_disc_init_link_req(b_ptr, &m_ptr->bcast_addr,
bcast_scope, 2);
}
- b_ptr->publ.lock = SPIN_LOCK_UNLOCKED;
+ spin_lock_init(&b_ptr->publ.lock);
write_unlock_bh(&tipc_net_lock);
info("Enabled bearer <%s>, discovery domain %s, priority %u\n",
name, addr_string_fill(addr_string, bcast_scope), priority);
diff --git a/net/tipc/config.c b/net/tipc/config.c
index 3ec502fac8c3..285e1bc2d880 100644
--- a/net/tipc/config.c
+++ b/net/tipc/config.c
@@ -63,7 +63,7 @@ struct manager {
static struct manager mng = { 0};
-static spinlock_t config_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(config_lock);
static const void *req_tlv_area; /* request message TLV area */
static int req_tlv_space; /* request message TLV area size */
diff --git a/net/tipc/dbg.c b/net/tipc/dbg.c
index 26ef95d5fe38..55130655e1ed 100644
--- a/net/tipc/dbg.c
+++ b/net/tipc/dbg.c
@@ -41,7 +41,7 @@
#define MAX_STRING 512
static char print_string[MAX_STRING];
-static spinlock_t print_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(print_lock);
static struct print_buf cons_buf = { NULL, 0, NULL, NULL };
struct print_buf *TIPC_CONS = &cons_buf;
diff --git a/net/tipc/handler.c b/net/tipc/handler.c
index 966f70a1b608..ae6ddf00a1aa 100644
--- a/net/tipc/handler.c
+++ b/net/tipc/handler.c
@@ -44,7 +44,7 @@ struct queue_item {
static kmem_cache_t *tipc_queue_item_cache;
static struct list_head signal_queue_head;
-static spinlock_t qitem_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(qitem_lock);
static int handler_enabled = 0;
static void process_signal_queue(unsigned long dummy);
diff --git a/net/tipc/name_table.c b/net/tipc/name_table.c
index 38571306aba5..a6926ff07bcc 100644
--- a/net/tipc/name_table.c
+++ b/net/tipc/name_table.c
@@ -101,7 +101,7 @@ struct name_table {
static struct name_table table = { NULL } ;
static atomic_t rsv_publ_ok = ATOMIC_INIT(0);
-rwlock_t tipc_nametbl_lock = RW_LOCK_UNLOCKED;
+DEFINE_RWLOCK(tipc_nametbl_lock);
static int hash(int x)
@@ -172,7 +172,7 @@ static struct name_seq *tipc_nameseq_create(u32 type, struct hlist_head *seq_hea
}
memset(nseq, 0, sizeof(*nseq));
- nseq->lock = SPIN_LOCK_UNLOCKED;
+ spin_lock_init(&nseq->lock);
nseq->type = type;
nseq->sseqs = sseq;
dbg("tipc_nameseq_create(): nseq = %p, type %u, ssseqs %p, ff: %u\n",
diff --git a/net/tipc/net.c b/net/tipc/net.c
index f7c8223ddf7d..e5a359ab4930 100644
--- a/net/tipc/net.c
+++ b/net/tipc/net.c
@@ -115,7 +115,7 @@
* - A local spin_lock protecting the queue of subscriber events.
*/
-rwlock_t tipc_net_lock = RW_LOCK_UNLOCKED;
+DEFINE_RWLOCK(tipc_net_lock);
struct network tipc_net = { NULL };
struct node *tipc_net_select_remote_node(u32 addr, u32 ref)
diff --git a/net/tipc/node.c b/net/tipc/node.c
index ce9678efa98a..861322b935da 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -77,7 +77,7 @@ struct node *tipc_node_create(u32 addr)
memset(n_ptr, 0, sizeof(*n_ptr));
n_ptr->addr = addr;
- n_ptr->lock = SPIN_LOCK_UNLOCKED;
+ spin_lock_init(&n_ptr->lock);
INIT_LIST_HEAD(&n_ptr->nsub);
n_ptr->owner = c_ptr;
tipc_cltr_attach_node(c_ptr, n_ptr);
diff --git a/net/tipc/port.c b/net/tipc/port.c
index 47d97404e3ee..3251c8d8e53c 100644
--- a/net/tipc/port.c
+++ b/net/tipc/port.c
@@ -57,8 +57,8 @@
static struct sk_buff *msg_queue_head = NULL;
static struct sk_buff *msg_queue_tail = NULL;
-spinlock_t tipc_port_list_lock = SPIN_LOCK_UNLOCKED;
-static spinlock_t queue_lock = SPIN_LOCK_UNLOCKED;
+DEFINE_SPINLOCK(tipc_port_list_lock);
+static DEFINE_SPINLOCK(queue_lock);
static LIST_HEAD(ports);
static void port_handle_node_down(unsigned long ref);
diff --git a/net/tipc/ref.c b/net/tipc/ref.c
index d2f0cce10e20..596d3c8ff750 100644
--- a/net/tipc/ref.c
+++ b/net/tipc/ref.c
@@ -63,7 +63,7 @@
struct ref_table tipc_ref_table = { NULL };
-static rwlock_t ref_table_lock = RW_LOCK_UNLOCKED;
+static DEFINE_RWLOCK(ref_table_lock);
/**
* tipc_ref_table_init - create reference table for objects
@@ -87,7 +87,7 @@ int tipc_ref_table_init(u32 requested_size, u32 start)
index_mask = sz - 1;
for (i = sz - 1; i >= 0; i--) {
table[i].object = NULL;
- table[i].lock = SPIN_LOCK_UNLOCKED;
+ spin_lock_init(&table[i].lock);
table[i].data.next_plus_upper = (start & ~index_mask) + i - 1;
}
tipc_ref_table.entries = table;
diff --git a/net/tipc/subscr.c b/net/tipc/subscr.c
index fc171875660c..e19b4bcd67ec 100644
--- a/net/tipc/subscr.c
+++ b/net/tipc/subscr.c
@@ -457,7 +457,7 @@ int tipc_subscr_start(void)
int res = -1;
memset(&topsrv, 0, sizeof (topsrv));
- topsrv.lock = SPIN_LOCK_UNLOCKED;
+ spin_lock_init(&topsrv.lock);
INIT_LIST_HEAD(&topsrv.subscriber_list);
spin_lock_bh(&topsrv.lock);
diff --git a/net/tipc/user_reg.c b/net/tipc/user_reg.c
index 3f3f933976e9..1e3ae57c7228 100644
--- a/net/tipc/user_reg.c
+++ b/net/tipc/user_reg.c
@@ -67,7 +67,7 @@ struct tipc_user {
static struct tipc_user *users = NULL;
static u32 next_free_user = MAX_USERID + 1;
-static spinlock_t reg_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(reg_lock);
/**
* reg_init - create TIPC user registry (but don't activate it)
commit e6e5494cb23d1933735ee47cc674ffe1c4afed6f
Author: Ingo Molnar <mingo@elte.hu>
Date: Tue Jun 27 02:53:50 2006 -0700
[PATCH] vdso: randomize the i386 vDSO by moving it into a vma
Move the i386 VDSO down into a vma and thus randomize it.
Besides the security implications, this feature also helps debuggers, which
can COW a vma-backed VDSO just like a normal DSO and can thus do
single-stepping and other debugging features.
It's good for hypervisors (Xen, VMWare) too, which typically live in the same
high-mapped address space as the VDSO, hence whenever the VDSO is used, they
get lots of guest pagefaults and have to fix such guest accesses up - which
slows things down instead of speeding things up (the primary purpose of the
VDSO).
There's a new CONFIG_COMPAT_VDSO (default=y) option, which provides support
for older glibcs that still rely on a prelinked high-mapped VDSO. Newer
distributions (using glibc 2.3.3 or later) can turn this option off. Turning
it off is also recommended for security reasons: attackers cannot use the
predictable high-mapped VDSO page as syscall trampoline anymore.
There is a new vdso=[0|1] boot option as well, and a runtime
/proc/sys/vm/vdso_enabled sysctl switch, that allows the VDSO to be turned
on/off.
(This version of the VDSO-randomization patch also has working ELF
coredumping, the previous patch crashed in the coredumping code.)
This code is a combined work of the exec-shield VDSO randomization
code and Gerd Hoffmann's hypervisor-centric VDSO patch. Rusty Russell
started this patch and i completed it.
[akpm@osdl.org: cleanups]
[akpm@osdl.org: compile fix]
[akpm@osdl.org: compile fix 2]
[akpm@osdl.org: compile fix 3]
[akpm@osdl.org: revernt MAXMEM change]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Cc: Gerd Hoffmann <kraxel@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 2e352a605fcf..0d189c93eeaf 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1669,6 +1669,10 @@ running once the system is up.
usbhid.mousepoll=
[USBHID] The interval which mice are to be polled at.
+ vdso= [IA-32]
+ vdso=1: enable VDSO (default)
+ vdso=0: disable VDSO mapping
+
video= [FB] Frame buffer configuration
See Documentation/fb/modedb.txt.
diff --git a/arch/i386/Kconfig b/arch/i386/Kconfig
index 6662f8c44798..3bb221db164a 100644
--- a/arch/i386/Kconfig
+++ b/arch/i386/Kconfig
@@ -780,6 +780,17 @@ config HOTPLUG_CPU
enable suspend on SMP systems. CPUs can be controlled through
/sys/devices/system/cpu.
+config COMPAT_VDSO
+ bool "Compat VDSO support"
+ default y
+ help
+ Map the VDSO to the predictable old-style address too.
+ ---help---
+ Say N here if you are running a sufficiently recent glibc
+ version (2.3.3 or later), to remove the high-mapped
+ VDSO mapping and to exclusively use the randomized VDSO.
+
+ If unsure, say Y.
endmenu
diff --git a/arch/i386/kernel/asm-offsets.c b/arch/i386/kernel/asm-offsets.c
index 1c3a809e6421..c80271f8f084 100644
--- a/arch/i386/kernel/asm-offsets.c
+++ b/arch/i386/kernel/asm-offsets.c
@@ -14,6 +14,7 @@
#include <asm/fixmap.h>
#include <asm/processor.h>
#include <asm/thread_info.h>
+#include <asm/elf.h>
#define DEFINE(sym, val) \
asm volatile("\n->" #sym " %0 " #val : : "i" (val))
@@ -54,6 +55,7 @@ void foo(void)
OFFSET(TI_preempt_count, thread_info, preempt_count);
OFFSET(TI_addr_limit, thread_info, addr_limit);
OFFSET(TI_restart_block, thread_info, restart_block);
+ OFFSET(TI_sysenter_return, thread_info, sysenter_return);
BLANK();
OFFSET(EXEC_DOMAIN_handler, exec_domain, handler);
@@ -69,7 +71,7 @@ void foo(void)
sizeof(struct tss_struct));
DEFINE(PAGE_SIZE_asm, PAGE_SIZE);
- DEFINE(VSYSCALL_BASE, __fix_to_virt(FIX_VSYSCALL));
+ DEFINE(VDSO_PRELINK, VDSO_PRELINK);
OFFSET(crypto_tfm_ctx_offset, crypto_tfm, __crt_ctx);
}
diff --git a/arch/i386/kernel/entry.S b/arch/i386/kernel/entry.S
index e8d2630fd19a..fbdb933251b6 100644
--- a/arch/i386/kernel/entry.S
+++ b/arch/i386/kernel/entry.S
@@ -270,7 +270,12 @@ sysenter_past_esp:
pushl $(__USER_CS)
CFI_ADJUST_CFA_OFFSET 4
/*CFI_REL_OFFSET cs, 0*/
- pushl $SYSENTER_RETURN
+ /*
+ * Push current_thread_info()->sysenter_return to the stack.
+ * A tiny bit of offset fixup is necessary - 4*4 means the 4 words
+ * pushed above; +8 corresponds to copy_thread's esp0 setting.
+ */
+ pushl (TI_sysenter_return-THREAD_SIZE+8+4*4)(%esp)
CFI_ADJUST_CFA_OFFSET 4
CFI_REL_OFFSET eip, 0
diff --git a/arch/i386/kernel/signal.c b/arch/i386/kernel/signal.c
index 5c352c3a9e7f..43002cfb40c4 100644
--- a/arch/i386/kernel/signal.c
+++ b/arch/i386/kernel/signal.c
@@ -351,7 +351,7 @@ static int setup_frame(int sig, struct k_sigaction *ka,
goto give_sigsegv;
}
- restorer = &__kernel_sigreturn;
+ restorer = (void *)VDSO_SYM(&__kernel_sigreturn);
if (ka->sa.sa_flags & SA_RESTORER)
restorer = ka->sa.sa_restorer;
@@ -447,7 +447,7 @@ static int setup_rt_frame(int sig, struct k_sigaction *ka, siginfo_t *info,
goto give_sigsegv;
/* Set up to return from userspace. */
- restorer = &__kernel_rt_sigreturn;
+ restorer = (void *)VDSO_SYM(&__kernel_rt_sigreturn);
if (ka->sa.sa_flags & SA_RESTORER)
restorer = ka->sa.sa_restorer;
err |= __put_user(restorer, &frame->pretcode);
diff --git a/arch/i386/kernel/sysenter.c b/arch/i386/kernel/sysenter.c
index 0bada1870bdf..c60419dee018 100644
--- a/arch/i386/kernel/sysenter.c
+++ b/arch/i386/kernel/sysenter.c
@@ -2,6 +2,8 @@
* linux/arch/i386/kernel/sysenter.c
*
* (C) Copyright 2002 Linus Torvalds
+ * Portions based on the vdso-randomization code from exec-shield:
+ * Copyright(C) 2005-2006, Red Hat, Inc., Ingo Molnar
*
* This file contains the needed initializations to support sysenter.
*/
@@ -13,12 +15,31 @@
#include <linux/gfp.h>
#include <linux/string.h>
#include <linux/elf.h>
+#include <linux/mm.h>
+#include <linux/module.h>
#include <asm/cpufeature.h>
#include <asm/msr.h>
#include <asm/pgtable.h>
#include <asm/unistd.h>
+/*
+ * Should the kernel map a VDSO page into processes and pass its
+ * address down to glibc upon exec()?
+ */
+unsigned int __read_mostly vdso_enabled = 1;
+
+EXPORT_SYMBOL_GPL(vdso_enabled);
+
+static int __init vdso_setup(char *s)
+{
+ vdso_enabled = simple_strtoul(s, NULL, 0);
+
+ return 1;
+}
+
+__setup("vdso=", vdso_setup);
+
extern asmlinkage void sysenter_entry(void);
void enable_sep_cpu(void)
@@ -45,23 +66,122 @@ void enable_sep_cpu(void)
*/
extern const char vsyscall_int80_start, vsyscall_int80_end;
extern const char vsyscall_sysenter_start, vsyscall_sysenter_end;
+static void *syscall_page;
int __init sysenter_setup(void)
{
- void *page = (void *)get_zeroed_page(GFP_ATOMIC);
+ syscall_page = (void *)get_zeroed_page(GFP_ATOMIC);
- __set_fixmap(FIX_VSYSCALL, __pa(page), PAGE_READONLY_EXEC);
+#ifdef CONFIG_COMPAT_VDSO
+ __set_fixmap(FIX_VDSO, __pa(syscall_page), PAGE_READONLY);
+ printk("Compat vDSO mapped to %08lx.\n", __fix_to_virt(FIX_VDSO));
+#else
+ /*
+ * In the non-compat case the ELF coredumping code needs the fixmap:
+ */
+ __set_fixmap(FIX_VDSO, __pa(syscall_page), PAGE_KERNEL_RO);
+#endif
if (!boot_cpu_has(X86_FEATURE_SEP)) {
- memcpy(page,
+ memcpy(syscall_page,
&vsyscall_int80_start,
&vsyscall_int80_end - &vsyscall_int80_start);
return 0;
}
- memcpy(page,
+ memcpy(syscall_page,
&vsyscall_sysenter_start,
&vsyscall_sysenter_end - &vsyscall_sysenter_start);
return 0;
}
+
+static struct page *syscall_nopage(struct vm_area_struct *vma,
+ unsigned long adr, int *type)
+{
+ struct page *p = virt_to_page(adr - vma->vm_start + syscall_page);
+ get_page(p);
+ return p;
+}
+
+/* Prevent VMA merging */
+static void syscall_vma_close(struct vm_area_struct *vma)
+{
+}
+
+static struct vm_operations_struct syscall_vm_ops = {
+ .close = syscall_vma_close,
+ .nopage = syscall_nopage,
+};
+
+/* Defined in vsyscall-sysenter.S */
+extern void SYSENTER_RETURN;
+
+/* Setup a VMA at program startup for the vsyscall page */
+int arch_setup_additional_pages(struct linux_binprm *bprm, int exstack)
+{
+ struct vm_area_struct *vma;
+ struct mm_struct *mm = current->mm;
+ unsigned long addr;
+ int ret;
+
+ down_write(&mm->mmap_sem);
+ addr = get_unmapped_area(NULL, 0, PAGE_SIZE, 0, 0);
+ if (IS_ERR_VALUE(addr)) {
+ ret = addr;
+ goto up_fail;
+ }
+
+ vma = kmem_cache_zalloc(vm_area_cachep, SLAB_KERNEL);
+ if (!vma) {
+ ret = -ENOMEM;
+ goto up_fail;
+ }
+
+ vma->vm_start = addr;
+ vma->vm_end = addr + PAGE_SIZE;
+ /* MAYWRITE to allow gdb to COW and set breakpoints */
+ vma->vm_flags = VM_READ|VM_EXEC|VM_MAYREAD|VM_MAYEXEC|VM_MAYWRITE;
+ vma->vm_flags |= mm->def_flags;
+ vma->vm_page_prot = protection_map[vma->vm_flags & 7];
+ vma->vm_ops = &syscall_vm_ops;
+ vma->vm_mm = mm;
+
+ ret = insert_vm_struct(mm, vma);
+ if (ret)
+ goto free_vma;
+
+ current->mm->context.vdso = (void *)addr;
+ current_thread_info()->sysenter_return =
+ (void *)VDSO_SYM(&SYSENTER_RETURN);
+ mm->total_vm++;
+up_fail:
+ up_write(&mm->mmap_sem);
+ return ret;
+
+free_vma:
+ kmem_cache_free(vm_area_cachep, vma);
+ return ret;
+}
+
+const char *arch_vma_name(struct vm_area_struct *vma)
+{
+ if (vma->vm_mm && vma->vm_start == (long)vma->vm_mm->context.vdso)
+ return "[vdso]";
+ return NULL;
+}
+
+struct vm_area_struct *get_gate_vma(struct task_struct *tsk)
+{
+ return NULL;
+}
+
+int in_gate_area(struct task_struct *task, unsigned long addr)
+{
+ return 0;
+}
+
+int in_gate_area_no_task(unsigned long addr)
+{
+ return 0;
+}
diff --git a/arch/i386/kernel/vsyscall-sysenter.S b/arch/i386/kernel/vsyscall-sysenter.S
index 3b62baa6a371..1a36d26e15eb 100644
--- a/arch/i386/kernel/vsyscall-sysenter.S
+++ b/arch/i386/kernel/vsyscall-sysenter.S
@@ -42,10 +42,10 @@ __kernel_vsyscall:
/* 7: align return point with nop's to make disassembly easier */
.space 7,0x90
- /* 14: System call restart point is here! (SYSENTER_RETURN - 2) */
+ /* 14: System call restart point is here! (SYSENTER_RETURN-2) */
jmp .Lenter_kernel
/* 16: System call normal return point is here! */
- .globl SYSENTER_RETURN /* Symbol used by entry.S. */
+ .globl SYSENTER_RETURN /* Symbol used by sysenter.c */
SYSENTER_RETURN:
pop %ebp
.Lpop_ebp:
diff --git a/arch/i386/kernel/vsyscall.lds.S b/arch/i386/kernel/vsyscall.lds.S
index 98699ca6e52d..e26975fc68b6 100644
--- a/arch/i386/kernel/vsyscall.lds.S
+++ b/arch/i386/kernel/vsyscall.lds.S
@@ -7,7 +7,7 @@
SECTIONS
{
- . = VSYSCALL_BASE + SIZEOF_HEADERS;
+ . = VDSO_PRELINK + SIZEOF_HEADERS;
.hash : { *(.hash) } :text
.dynsym : { *(.dynsym) }
@@ -20,7 +20,7 @@ SECTIONS
For the layouts to match, we need to skip more than enough
space for the dynamic symbol table et al. If this amount
is insufficient, ld -shared will barf. Just increase it here. */
- . = VSYSCALL_BASE + 0x400;
+ . = VDSO_PRELINK + 0x400;
.text : { *(.text) } :text =0x90909090
.note : { *(.note.*) } :text :note
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 0137ec4c1368..0a163a4f7764 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -122,6 +122,11 @@ struct mem_size_stats
unsigned long private_dirty;
};
+__attribute__((weak)) const char *arch_vma_name(struct vm_area_struct *vma)
+{
+ return NULL;
+}
+
static int show_map_internal(struct seq_file *m, void *v, struct mem_size_stats *mss)
{
struct proc_maps_private *priv = m->private;
@@ -158,22 +163,23 @@ static int show_map_internal(struct seq_file *m, void *v, struct mem_size_stats
pad_len_spaces(m, len);
seq_path(m, file->f_vfsmnt, file->f_dentry, "\n");
} else {
- if (mm) {
- if (vma->vm_start <= mm->start_brk &&
+ const char *name = arch_vma_name(vma);
+ if (!name) {
+ if (mm) {
+ if (vma->vm_start <= mm->start_brk &&
vma->vm_end >= mm->brk) {
- pad_len_spaces(m, len);
- seq_puts(m, "[heap]");
- } else {
- if (vma->vm_start <= mm->start_stack &&
- vma->vm_end >= mm->start_stack) {
-
- pad_len_spaces(m, len);
- seq_puts(m, "[stack]");
+ name = "[heap]";
+ } else if (vma->vm_start <= mm->start_stack &&
+ vma->vm_end >= mm->start_stack) {
+ name = "[stack]";
}
+ } else {
+ name = "[vdso]";
}
- } else {
+ }
+ if (name) {
pad_len_spaces(m, len);
- seq_puts(m, "[vdso]");
+ seq_puts(m, name);
}
}
seq_putc(m, '\n');
diff --git a/include/asm-i386/elf.h b/include/asm-i386/elf.h
index 4153d80e4d2b..1eac92cb5b16 100644
--- a/include/asm-i386/elf.h
+++ b/include/asm-i386/elf.h
@@ -10,6 +10,7 @@
#include <asm/processor.h>
#include <asm/system.h> /* for savesegment */
#include <asm/auxvec.h>
+#include <asm/desc.h>
#include <linux/utsname.h>
@@ -129,15 +130,41 @@ extern int dump_task_extended_fpu (struct task_struct *, struct user_fxsr_struct
#define ELF_CORE_COPY_FPREGS(tsk, elf_fpregs) dump_task_fpu(tsk, elf_fpregs)
#define ELF_CORE_COPY_XFPREGS(tsk, elf_xfpregs) dump_task_extended_fpu(tsk, elf_xfpregs)
-#define VSYSCALL_BASE (__fix_to_virt(FIX_VSYSCALL))
-#define VSYSCALL_EHDR ((const struct elfhdr *) VSYSCALL_BASE)
-#define VSYSCALL_ENTRY ((unsigned long) &__kernel_vsyscall)
+#define VDSO_HIGH_BASE (__fix_to_virt(FIX_VDSO))
+#define VDSO_BASE ((unsigned long)current->mm->context.vdso)
+
+#ifdef CONFIG_COMPAT_VDSO
+# define VDSO_COMPAT_BASE VDSO_HIGH_BASE
+# define VDSO_PRELINK VDSO_HIGH_BASE
+#else
+# define VDSO_COMPAT_BASE VDSO_BASE
+# define VDSO_PRELINK 0
+#endif
+
+#define VDSO_COMPAT_SYM(x) \
+ (VDSO_COMPAT_BASE + (unsigned long)(x) - VDSO_PRELINK)
+
+#define VDSO_SYM(x) \
+ (VDSO_BASE + (unsigned long)(x) - VDSO_PRELINK)
+
+#define VDSO_HIGH_EHDR ((const struct elfhdr *) VDSO_HIGH_BASE)
+#define VDSO_EHDR ((const struct elfhdr *) VDSO_COMPAT_BASE)
+
extern void __kernel_vsyscall;
+#define VDSO_ENTRY VDSO_SYM(&__kernel_vsyscall)
+
+#define ARCH_HAS_SETUP_ADDITIONAL_PAGES
+struct linux_binprm;
+extern int arch_setup_additional_pages(struct linux_binprm *bprm,
+ int executable_stack);
+
+extern unsigned int vdso_enabled;
+
#define ARCH_DLINFO \
-do { \
- NEW_AUX_ENT(AT_SYSINFO, VSYSCALL_ENTRY); \
- NEW_AUX_ENT(AT_SYSINFO_EHDR, VSYSCALL_BASE); \
+do if (vdso_enabled) { \
+ NEW_AUX_ENT(AT_SYSINFO, VDSO_ENTRY); \
+ NEW_AUX_ENT(AT_SYSINFO_EHDR, VDSO_COMPAT_BASE); \
} while (0)
/*
@@ -148,15 +175,15 @@ do { \
* Dumping its extra ELF program headers includes all the other information
* a debugger needs to easily find how the vsyscall DSO was being used.
*/
-#define ELF_CORE_EXTRA_PHDRS (VSYSCALL_EHDR->e_phnum)
+#define ELF_CORE_EXTRA_PHDRS (VDSO_HIGH_EHDR->e_phnum)
#define ELF_CORE_WRITE_EXTRA_PHDRS \
do { \
const struct elf_phdr *const vsyscall_phdrs = \
- (const struct elf_phdr *) (VSYSCALL_BASE \
- + VSYSCALL_EHDR->e_phoff); \
+ (const struct elf_phdr *) (VDSO_HIGH_BASE \
+ + VDSO_HIGH_EHDR->e_phoff); \
int i; \
Elf32_Off ofs = 0; \
- for (i = 0; i < VSYSCALL_EHDR->e_phnum; ++i) { \
+ for (i = 0; i < VDSO_HIGH_EHDR->e_phnum; ++i) { \
struct elf_phdr phdr = vsyscall_phdrs[i]; \
if (phdr.p_type == PT_LOAD) { \
BUG_ON(ofs != 0); \
@@ -174,10 +201,10 @@ do { \
#define ELF_CORE_WRITE_EXTRA_DATA \
do { \
const struct elf_phdr *const vsyscall_phdrs = \
- (const struct elf_phdr *) (VSYSCALL_BASE \
- + VSYSCALL_EHDR->e_phoff); \
+ (const struct elf_phdr *) (VDSO_HIGH_BASE \
+ + VDSO_HIGH_EHDR->e_phoff); \
int i; \
- for (i = 0; i < VSYSCALL_EHDR->e_phnum; ++i) { \
+ for (i = 0; i < VDSO_HIGH_EHDR->e_phnum; ++i) { \
if (vsyscall_phdrs[i].p_type == PT_LOAD) \
DUMP_WRITE((void *) vsyscall_phdrs[i].p_vaddr, \
PAGE_ALIGN(vsyscall_phdrs[i].p_memsz)); \
diff --git a/include/asm-i386/fixmap.h b/include/asm-i386/fixmap.h
index f7e068f4d2f9..a48cc3f7ccc6 100644
--- a/include/asm-i386/fixmap.h
+++ b/include/asm-i386/fixmap.h
@@ -51,7 +51,7 @@
*/
enum fixed_addresses {
FIX_HOLE,
- FIX_VSYSCALL,
+ FIX_VDSO,
#ifdef CONFIG_X86_LOCAL_APIC
FIX_APIC_BASE, /* local (CPU) APIC) -- required for SMP or not */
#endif
@@ -115,14 +115,6 @@ extern void __set_fixmap (enum fixed_addresses idx,
#define __fix_to_virt(x) (FIXADDR_TOP - ((x) << PAGE_SHIFT))
#define __virt_to_fix(x) ((FIXADDR_TOP - ((x)&PAGE_MASK)) >> PAGE_SHIFT)
-/*
- * This is the range that is readable by user mode, and things
- * acting like user mode such as get_user_pages.
- */
-#define FIXADDR_USER_START (__fix_to_virt(FIX_VSYSCALL))
-#define FIXADDR_USER_END (FIXADDR_USER_START + PAGE_SIZE)
-
-
extern void __this_fixmap_does_not_exist(void);
/*
diff --git a/include/asm-i386/mmu.h b/include/asm-i386/mmu.h
index f431a0b86d4c..8358dd3df7aa 100644
--- a/include/asm-i386/mmu.h
+++ b/include/asm-i386/mmu.h
@@ -12,6 +12,7 @@ typedef struct {
int size;
struct semaphore sem;
void *ldt;
+ void *vdso;
} mm_context_t;
#endif
diff --git a/include/asm-i386/page.h b/include/asm-i386/page.h
index e3a552fa5538..f5bf544c729a 100644
--- a/include/asm-i386/page.h
+++ b/include/asm-i386/page.h
@@ -96,6 +96,8 @@ typedef struct { unsigned long pgprot; } pgprot_t;
#ifndef __ASSEMBLY__
+struct vm_area_struct;
+
/*
* This much address space is reserved for vmalloc() and iomap()
* as well as fixmap mappings.
@@ -139,6 +141,7 @@ extern int page_is_ram(unsigned long pagenr);
#include <asm-generic/memory_model.h>
#include <asm-generic/page.h>
+#define __HAVE_ARCH_GATE_AREA 1
#endif /* __KERNEL__ */
#endif /* _I386_PAGE_H */
diff --git a/include/asm-i386/thread_info.h b/include/asm-i386/thread_info.h
index ff1e2b1a7c84..2833fa2c0dd0 100644
--- a/include/asm-i386/thread_info.h
+++ b/include/asm-i386/thread_info.h
@@ -37,6 +37,7 @@ struct thread_info {
0-0xBFFFFFFF for user-thead
0-0xFFFFFFFF for kernel-thread
*/
+ void *sysenter_return;
struct restart_block restart_block;
unsigned long previous_esp; /* ESP of the previous stack in case
diff --git a/include/asm-i386/unwind.h b/include/asm-i386/unwind.h
index d480f2e38215..69f0f1df6722 100644
--- a/include/asm-i386/unwind.h
+++ b/include/asm-i386/unwind.h
@@ -78,8 +78,8 @@ static inline int arch_unw_user_mode(const struct unwind_frame_info *info)
return user_mode_vm(&info->regs);
#else
return info->regs.eip < PAGE_OFFSET
- || (info->regs.eip >= __fix_to_virt(FIX_VSYSCALL)
- && info->regs.eip < __fix_to_virt(FIX_VSYSCALL) + PAGE_SIZE)
+ || (info->regs.eip >= __fix_to_virt(FIX_VDSO)
+ && info->regs.eip < __fix_to_virt(FIX_VDSO) + PAGE_SIZE)
|| info->regs.esp < PAGE_OFFSET;
#endif
}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index a929ea197e48..ff1fa87df8d0 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1065,5 +1065,7 @@ void drop_slab(void);
extern int randomize_va_space;
#endif
+const char *arch_vma_name(struct vm_area_struct *vma);
+
#endif /* __KERNEL__ */
#endif /* _LINUX_MM_H */
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 349ef908a222..bee12a7a0576 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -189,6 +189,7 @@ enum
VM_ZONE_RECLAIM_MODE=31, /* reclaim local zone memory before going off node */
VM_ZONE_RECLAIM_INTERVAL=32, /* time period to wait after reclaim failure */
VM_PANIC_ON_OOM=33, /* panic at out-of-memory */
+ VM_VDSO_ENABLED=34, /* map VDSO into new processes? */
};
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index f1a4eb1a655e..f54afed8426f 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -927,6 +927,18 @@ static ctl_table vm_table[] = {
.proc_handler = &proc_dointvec_jiffies,
.strategy = &sysctl_jiffies,
},
+#endif
+#ifdef CONFIG_X86_32
+ {
+ .ctl_name = VM_VDSO_ENABLED,
+ .procname = "vdso_enabled",
+ .data = &vdso_enabled,
+ .maxlen = sizeof(vdso_enabled),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec,
+ .strategy = &sysctl_intvec,
+ .extra1 = &zero,
+ },
#endif
{ .ctl_name = 0 }
};
commit 26a3c49cec96ffb9cfcc30dfa0cd05ccc25dcb3a
Author: Ingo Molnar <mingo@elte.hu>
Date: Mon Jun 26 13:57:16 2006 +0200
[PATCH] x86_64: fix vector_lock deadlock in io_apic.c
Fix a potential deadlock scenario introduced by io_apic.c's new vector_lock
on i386 and x86_64.
Found by the locking correctness validator. The patch was boot-tested on
x86. For details of the deadlock scenario, see the validator output:
======================================================
[ BUG: hard-safe -> hard-unsafe lock order detected! ]
------------------------------------------------------
idle/1 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
(msi_lock){....}, at: [<c04ff8d2>] startup_msi_irq_wo_maskbit+0x10/0x35
and this task is already holding:
(&irq_desc[i].lock){++..}, at: [<c015b924>] probe_irq_on+0x36/0x107
which would create a new lock dependency:
(&irq_desc[i].lock){++..} -> (msi_lock){....}
but this new dependency connects a hard-irq-safe lock:
(&irq_desc[i].lock){++..}
... which became hard-irq-safe at:
[<c01468c4>] lockdep_acquire+0x68/0x84
[<c10485e9>] _spin_lock+0x21/0x2f
[<c015aff5>] __do_IRQ+0x3d/0x113
[<c01062d3>] do_IRQ+0x8c/0xad
to a hard-irq-unsafe lock:
(vector_lock){--..}
... which became hard-irq-unsafe at:
... [<c01468c4>] lockdep_acquire+0x68/0x84
[<c10485e9>] _spin_lock+0x21/0x2f
[<c011b5e8>] assign_irq_vector+0x34/0xc8
[<c1aa82fa>] setup_IO_APIC+0x45a/0xcff
[<c1aa56e3>] smp_prepare_cpus+0x5ea/0x8aa
[<c010033f>] init+0x32/0x2cb
[<c0102005>] kernel_thread_helper+0x5/0xb
which could potentially lead to deadlocks!
other info that might help us debug this:
3 locks held by idle/1:
#0: (port_mutex){--..}, at: [<c067070d>] uart_add_one_port+0x61/0x289
#1: (&state->mutex){--..}, at: [<c067071f>] uart_add_one_port+0x73/0x289
#2: (&irq_desc[i].lock){++..}, at: [<c015b924>] probe_irq_on+0x36/0x107
the hard-irq-safe lock's dependencies:
-> (&irq_desc[i].lock){++..} ops: 9861 {
initial-use at:
[<c01468c4>] lockdep_acquire+0x68/0x84
[<c10487f4>] _spin_lock_irqsave+0x2a/0x3a
[<c015b415>] setup_irq+0x9b/0x14d
[<c1aaa4c4>] time_init_hook+0xf/0x11
[<c1a9f320>] time_init+0x44/0x46
[<c1a9955f>] start_kernel+0x191/0x38f
[<c0100210>] 0xc0100210
in-hardirq-W at:
[<c01468c4>] lockdep_acquire+0x68/0x84
[<c10485e9>] _spin_lock+0x21/0x2f
[<c015aff5>] __do_IRQ+0x3d/0x113
[<c01062d3>] do_IRQ+0x8c/0xad
in-softirq-W at:
[<c01468c4>] lockdep_acquire+0x68/0x84
[<c10485e9>] _spin_lock+0x21/0x2f
[<c015aff5>] __do_IRQ+0x3d/0x113
[<c01062d3>] do_IRQ+0x8c/0xad
}
... key at: [<c1ea31e0>] irq_desc_lock_type+0x0/0x20
-> (i8259A_lock){++..} ops: 5149 {
initial-use at:
[<c01468c4>] lockdep_acquire+0x68/0x84
[<c10487f4>] _spin_lock_irqsave+0x2a/0x3a
[<c0108090>] init_8259A+0x11/0x8f
[<c1aa0d22>] init_ISA_irqs+0x12/0x4d
[<c1aaa4f0>] pre_intr_init_hook+0x8/0xa
[<c1aa0cb9>] init_IRQ+0xe/0x65
[<c1a99546>] start_kernel+0x178/0x38f
[<c0100210>] 0xc0100210
in-hardirq-W at:
[<c01468c4>] lockdep_acquire+0x68/0x84
[<c10487f4>] _spin_lock_irqsave+0x2a/0x3a
[<c0107fb0>] mask_and_ack_8259A+0x1b/0xcc
[<c015b007>] __do_IRQ+0x4f/0x113
[<c01062d3>] do_IRQ+0x8c/0xad
in-softirq-W at:
[<c01468c4>] lockdep_acquire+0x68/0x84
[<c10487f4>] _spin_lock_irqsave+0x2a/0x3a
[<c0107fb0>] mask_and_ack_8259A+0x1b/0xcc
[<c015b007>] __do_IRQ+0x4f/0x113
[<c01062d3>] do_IRQ+0x8c/0xad
}
... key at: [<c142f174>] i8259A_lock+0x14/0x40
... acquired at:
[<c01468c4>] lockdep_acquire+0x68/0x84
[<c10487f4>] _spin_lock_irqsave+0x2a/0x3a
[<c0107eb2>] enable_8259A_irq+0x10/0x47
[<c0107f12>] startup_8259A_irq+0x8/0xc
[<c015b45e>] setup_irq+0xe4/0x14d
[<c1aaa4c4>] time_init_hook+0xf/0x11
[<c1a9f320>] time_init+0x44/0x46
[<c1a9955f>] start_kernel+0x191/0x38f
[<c0100210>] 0xc0100210
-> (ioapic_lock){+...} ops: 122 {
initial-use at:
[<c01468c4>] lockdep_acquire+0x68/0x84
[<c10487f4>] _spin_lock_irqsave+0x2a/0x3a
[<c1aa71db>] io_apic_get_version+0x16/0x55
[<c1aa5c73>] mp_register_ioapic+0xc6/0x127
[<c1aa382e>] acpi_parse_ioapic+0x2d/0x39
[<c1abe031>] acpi_table_parse_madt_family+0xb4/0x100
[<c1abe093>] acpi_table_parse_madt+0x16/0x18
[<c1aa3c8a>] acpi_boot_init+0x132/0x251
[<c1aa08ea>] setup_arch+0xd36/0xe37
[<c1a99434>] start_kernel+0x66/0x38f
[<c0100210>] 0xc0100210
in-hardirq-W at:
[<c01468c4>] lockdep_acquire+0x68/0x84
[<c10487f4>] _spin_lock_irqsave+0x2a/0x3a
[<c011bce1>] mask_IO_APIC_irq+0x11/0x31
[<c011c5cc>] ack_edge_ioapic_vector+0x31/0x41
[<c015b007>] __do_IRQ+0x4f/0x113
[<c01062d3>] do_IRQ+0x8c/0xad
}
... key at: [<c1432514>] ioapic_lock+0x14/0x3c
-> (i8259A_lock){++..} ops: 5149 {
initial-use at:
[<c01468c4>] lockdep_acquire+0x68/0x84
[<c10487f4>] _spin_lock_irqsave+0x2a/0x3a
[<c0108090>] init_8259A+0x11/0x8f
[<c1aa0d22>] init_ISA_irqs+0x12/0x4d
[<c1aaa4f0>] pre_intr_init_hook+0x8/0xa
[<c1aa0cb9>] init_IRQ+0xe/0x65
[<c1a99546>] start_kernel+0x178/0x38f
[<c0100210>] 0xc0100210
in-hardirq-W at:
[<c01468c4>] lockdep_acquire+0x68/0x84
[<c10487f4>] _spin_lock_irqsave+0x2a/0x3a
[<c0107fb0>] mask_and_ack_8259A+0x1b/0xcc
[<c015b007>] __do_IRQ+0x4f/0x113
[<c01062d3>] do_IRQ+0x8c/0xad
in-softirq-W at:
[<c01468c4>] lockdep_acquire+0x68/0x84
[<c10487f4>] _spin_lock_irqsave+0x2a/0x3a
[<c0107fb0>] mask_and_ack_8259A+0x1b/0xcc
[<c015b007>] __do_IRQ+0x4f/0x113
[<c01062d3>] do_IRQ+0x8c/0xad
}
... key at: [<c142f174>] i8259A_lock+0x14/0x40
... acquired at:
[<c01468c4>] lockdep_acquire+0x68/0x84
[<c10487f4>] _spin_lock_irqsave+0x2a/0x3a
[<c0107e6b>] disable_8259A_irq+0x10/0x47
[<c011bdbd>] startup_edge_ioapic_vector+0x31/0x58
[<c015b45e>] setup_irq+0xe4/0x14d
[<c015b5a1>] request_irq+0xda/0xf9
[<c1ac983a>] rtc_init+0x6a/0x1a7
[<c0100457>] init+0x14a/0x2cb
[<c0102005>] kernel_thread_helper+0x5/0xb
... acquired at:
[<c01468c4>] lockdep_acquire+0x68/0x84
[<c10487f4>] _spin_lock_irqsave+0x2a/0x3a
[<c011bce1>] mask_IO_APIC_irq+0x11/0x31
[<c011c5cc>] ack_edge_ioapic_vector+0x31/0x41
[<c015b007>] __do_IRQ+0x4f/0x113
[<c01062d3>] do_IRQ+0x8c/0xad
the hard-irq-unsafe lock's dependencies:
-> (vector_lock){--..} ops: 31 {
initial-use at:
[<c01468c4>] lockdep_acquire+0x68/0x84
[<c10485e9>] _spin_lock+0x21/0x2f
[<c011b5e8>] assign_irq_vector+0x34/0xc8
[<c1aa82fa>] setup_IO_APIC+0x45a/0xcff
[<c1aa56e3>] smp_prepare_cpus+0x5ea/0x8aa
[<c010033f>] init+0x32/0x2cb
[<c0102005>] kernel_thread_helper+0x5/0xb
softirq-on-W at:
[<c01468c4>] lockdep_acquire+0x68/0x84
[<c10485e9>] _spin_lock+0x21/0x2f
[<c011b5e8>] assign_irq_vector+0x34/0xc8
[<c1aa82fa>] setup_IO_APIC+0x45a/0xcff
[<c1aa56e3>] smp_prepare_cpus+0x5ea/0x8aa
[<c010033f>] init+0x32/0x2cb
[<c0102005>] kernel_thread_helper+0x5/0xb
hardirq-on-W at:
[<c01468c4>] lockdep_acquire+0x68/0x84
[<c10485e9>] _spin_lock+0x21/0x2f
[<c011b5e8>] assign_irq_vector+0x34/0xc8
[<c1aa82fa>] setup_IO_APIC+0x45a/0xcff
[<c1aa56e3>] smp_prepare_cpus+0x5ea/0x8aa
[<c010033f>] init+0x32/0x2cb
[<c0102005>] kernel_thread_helper+0x5/0xb
}
... key at: [<c1432574>] vector_lock+0x14/0x3c
stack backtrace:
[<c0104f36>] show_trace+0xd/0xf
[<c010543e>] dump_stack+0x17/0x19
[<c0144e34>] check_usage+0x1f6/0x203
[<c0146395>] __lockdep_acquire+0x8c2/0xaa5
[<c01468c4>] lockdep_acquire+0x68/0x84
[<c10487f4>] _spin_lock_irqsave+0x2a/0x3a
[<c04ff8d2>] startup_msi_irq_wo_maskbit+0x10/0x35
[<c015b932>] probe_irq_on+0x44/0x107
[<c0673d58>] serial8250_config_port+0x84b/0x986
[<c06707b1>] uart_add_one_port+0x105/0x289
[<c1ace54b>] serial8250_init+0xc3/0x10a
[<c0100457>] init+0x14a/0x2cb
[<c0102005>] kernel_thread_helper+0x5/0xb
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
diff --git a/arch/i386/kernel/io_apic.c b/arch/i386/kernel/io_apic.c
index 61317f4be444..72ae414e4d49 100644
--- a/arch/i386/kernel/io_apic.c
+++ b/arch/i386/kernel/io_apic.c
@@ -1163,14 +1163,15 @@ u8 irq_vector[NR_IRQ_VECTORS] __read_mostly = { FIRST_DEVICE_VECTOR , 0 };
int assign_irq_vector(int irq)
{
static int current_vector = FIRST_DEVICE_VECTOR, offset = 0;
+ unsigned long flags;
int vector;
BUG_ON(irq != AUTO_ASSIGN && (unsigned)irq >= NR_IRQ_VECTORS);
- spin_lock(&vector_lock);
+ spin_lock_irqsave(&vector_lock, flags);
if (irq != AUTO_ASSIGN && IO_APIC_VECTOR(irq) > 0) {
- spin_unlock(&vector_lock);
+ spin_unlock_irqrestore(&vector_lock, flags);
return IO_APIC_VECTOR(irq);
}
next:
@@ -1181,7 +1182,7 @@ int assign_irq_vector(int irq)
if (current_vector >= FIRST_SYSTEM_VECTOR) {
offset++;
if (!(offset%8)) {
- spin_unlock(&vector_lock);
+ spin_unlock_irqrestore(&vector_lock, flags);
return -ENOSPC;
}
current_vector = FIRST_DEVICE_VECTOR + offset;
@@ -1192,7 +1193,7 @@ int assign_irq_vector(int irq)
if (irq != AUTO_ASSIGN)
IO_APIC_VECTOR(irq) = vector;
- spin_unlock(&vector_lock);
+ spin_unlock_irqrestore(&vector_lock, flags);
return vector;
}
diff --git a/arch/x86_64/kernel/io_apic.c b/arch/x86_64/kernel/io_apic.c
index 38a3ff30bde1..519cd4e6f9e7 100644
--- a/arch/x86_64/kernel/io_apic.c
+++ b/arch/x86_64/kernel/io_apic.c
@@ -836,14 +836,15 @@ u8 irq_vector[NR_IRQ_VECTORS] __read_mostly = { FIRST_DEVICE_VECTOR , 0 };
int assign_irq_vector(int irq)
{
static int current_vector = FIRST_DEVICE_VECTOR, offset = 0;
+ unsigned long flags;
int vector;
BUG_ON(irq != AUTO_ASSIGN && (unsigned)irq >= NR_IRQ_VECTORS);
- spin_lock(&vector_lock);
+ spin_lock_irqsave(&vector_lock, flags);
if (irq != AUTO_ASSIGN && IO_APIC_VECTOR(irq) > 0) {
- spin_unlock(&vector_lock);
+ spin_unlock_irqrestore(&vector_lock, flags);
return IO_APIC_VECTOR(irq);
}
next:
@@ -862,7 +863,7 @@ int assign_irq_vector(int irq)
if (irq != AUTO_ASSIGN)
IO_APIC_VECTOR(irq) = vector;
- spin_unlock(&vector_lock);
+ spin_unlock_irqrestore(&vector_lock, flags);
return vector;
}
commit 14118c3cdd46d72e503ee2f727b11d881f72f755
Author: Ingo Molnar <mingo@elte.hu>
Date: Mon Jun 26 13:56:58 2006 +0200
[PATCH] x86_64: fix unlikely profiling & vsyscalls on x86_64
fix unlikely profiling in vsyscalls ...
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
diff --git a/arch/x86_64/kernel/vsyscall.c b/arch/x86_64/kernel/vsyscall.c
index 9468fb20b0bc..f603037df162 100644
--- a/arch/x86_64/kernel/vsyscall.c
+++ b/arch/x86_64/kernel/vsyscall.c
@@ -107,7 +107,7 @@ static __always_inline long time_syscall(long *t)
int __vsyscall(0) vgettimeofday(struct timeval * tv, struct timezone * tz)
{
- if (unlikely(!__sysctl_vsyscall))
+ if (!__sysctl_vsyscall)
return gettimeofday(tv,tz);
if (tv)
do_vgettimeofday(tv);
@@ -120,7 +120,7 @@ int __vsyscall(0) vgettimeofday(struct timeval * tv, struct timezone * tz)
* unlikely */
time_t __vsyscall(1) vtime(time_t *t)
{
- if (unlikely(!__sysctl_vsyscall))
+ if (!__sysctl_vsyscall)
return time_syscall(t);
else if (t)
*t = __xtime.tv_sec;
commit 3c5846470c30580bbcb4d5f2339f75a2c88cfe6e
Author: Ingo Molnar <mingo@elte.hu>
Date: Mon Jun 26 13:56:25 2006 +0200
[PATCH] x86_64: x86_64-enable-large-bzImage.patch
enable large bzImages on x86_64. (fix is from x86's build.c) Using this
patch i have successfully built and booted an allyesconfig 13MB+ bzImage
on x86_64 too:
$ size64 vmlinux
text data bss dec hex filename
23444831 8202642 3439360 35086833 21761f1 vmlinux
-rw-rw-r-- 1 mingo mingo 13121740 Apr 19 09:32 arch/x86_64/boot/bzImage
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
diff --git a/arch/x86_64/boot/tools/build.c b/arch/x86_64/boot/tools/build.c
index c44f5e2ec100..eae86691709a 100644
--- a/arch/x86_64/boot/tools/build.c
+++ b/arch/x86_64/boot/tools/build.c
@@ -149,10 +149,8 @@ int main(int argc, char ** argv)
sz = sb.st_size;
fprintf (stderr, "System is %d kB\n", sz/1024);
sys_size = (sz + 15) / 16;
- /* 0x40000*16 = 4.0 MB, reasonable estimate for the current maximum */
- if (sys_size > (is_big_kernel ? 0x40000 : DEF_SYSSIZE))
- die("System is too big. Try using %smodules.",
- is_big_kernel ? "" : "bzImage or ");
+ if (!is_big_kernel && sys_size > DEF_SYSSIZE)
+ die("System is too big. Try using bzImage or modules.");
while (sz > 0) {
int l, n;
commit 3d1c1cc962cebaae6a70fd89a0adb29ad10a2a12
Author: Ingo Molnar <mingo@elte.hu>
Date: Mon Jun 26 00:26:17 2006 -0700
[PATCH] fix IDE deadlock in error reporting code
Michal Piotrowski reported the following validator assert:
hdd: set_drive_speed_status: status=0x51 { DriveReady SeekComplete Error }
hdd: set_drive_speed_status: error=0xb4 { AbortedCommand LastFailedSense=0x0b }
============================
[ BUG: illegal lock usage! ]
----------------------------
illegal {in-hardirq-W} -> {hardirq-on-W} usage.
hdparm/1821 [HC0[0]:SC0[0]:HE1:SE1] takes:
(ide_lock){++..}, at: [<c0268388>] ide_dump_opcode+0x13/0x9b
[...]
stack backtrace:
[<c0104513>] show_trace+0x1b/0x20
[<c01045f1>] dump_stack+0x1f/0x24
[<c013976c>] print_usage_bug+0x1a5/0x1b1
[<c0139e90>] mark_lock+0x2ca/0x4f7
[<c013aa96>] __lockdep_acquire+0x47e/0xaa4
[<c013b536>] lockdep_acquire+0x67/0x7f
[<c030552d>] _spin_lock+0x24/0x32
[<c0268388>] ide_dump_opcode+0x13/0x9b
[<c02688b6>] ide_dump_status+0x4a6/0x4cc
[<c0267ae6>] ide_config_drive_speed+0x32a/0x33a
[<c0262dc5>] piix_tune_chipset+0x2ed/0x2f8
[<c0262e31>] piix_config_drive_xfer_rate+0x61/0xb5
[<c0263a82>] set_using_dma+0x2f/0x60
[<c0263bee>] ide_write_setting+0x4a/0xc3
[<c02647ca>] generic_ide_ioctl+0x8a/0x47f
[<f886003a>] idecd_ioctl+0xfd/0x133 [ide_cd]
[<c01f1fff>] blkdev_driver_ioctl+0x4b/0x5f
[<c01f2783>] blkdev_ioctl+0x770/0x7bd
[<c017dc0d>] block_ioctl+0x1f/0x21
[<c0189353>] do_ioctl+0x27/0x6e
[<c0189604>] vfs_ioctl+0x26a/0x280
[<c0189667>] sys_ioctl+0x4d/0x7e
[<c0305ed2>] sysenter_past_esp+0x63/0xa1
in ide_dump_opcode() takes the ide_lock in an irq-unsafe manner, i.e. this
function expects to be called with irqs disabled. But
ide_dump_ata[pi]_status() doesnt do that - it enables interrupts specifically.
That is a no-no - what guarantees that another IDE port couldnt generate an
IDE interrupt while we are dumping this error? The fix is to turn the
irq-enabling in these functions into irq-disabling.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
diff --git a/drivers/ide/ide-lib.c b/drivers/ide/ide-lib.c
index 16a143133f93..7ddb11828731 100644
--- a/drivers/ide/ide-lib.c
+++ b/drivers/ide/ide-lib.c
@@ -485,7 +485,7 @@ static u8 ide_dump_ata_status(ide_drive_t *drive, const char *msg, u8 stat)
unsigned long flags;
u8 err = 0;
- local_irq_set(flags);
+ local_irq_save(flags);
printk("%s: %s: status=0x%02x { ", drive->name, msg, stat);
if (stat & BUSY_STAT)
printk("Busy ");
@@ -567,7 +567,7 @@ static u8 ide_dump_atapi_status(ide_drive_t *drive, const char *msg, u8 stat)
status.all = stat;
error.all = 0;
- local_irq_set(flags);
+ local_irq_save(flags);
printk("%s: %s: status=0x%02x { ", drive->name, msg, stat);
if (status.b.bsy)
printk("Busy ");
commit 81615b624a45621b758380ec45d750483eae281d
Author: Ingo Molnar <mingo@elte.hu>
Date: Mon Jun 26 00:24:32 2006 -0700
[PATCH] Convert kernel/cpu.c to mutexes
Convert kernel/cpu.c from semaphore to mutex.
I've reviewed all lock_cpu_hotplug() critical sections, and they all seem to
fit mutex semantics.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
diff --git a/kernel/cpu.c b/kernel/cpu.c
index fe2b8d0bfe4c..03dcd981846a 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -13,10 +13,10 @@
#include <linux/module.h>
#include <linux/kthread.h>
#include <linux/stop_machine.h>
-#include <asm/semaphore.h>
+#include <linux/mutex.h>
/* This protects CPUs going up and down... */
-static DECLARE_MUTEX(cpucontrol);
+static DEFINE_MUTEX(cpucontrol);
static BLOCKING_NOTIFIER_HEAD(cpu_chain);
@@ -30,9 +30,9 @@ static int __lock_cpu_hotplug(int interruptible)
if (lock_cpu_hotplug_owner != current) {
if (interruptible)
- ret = down_interruptible(&cpucontrol);
+ ret = mutex_lock_interruptible(&cpucontrol);
else
- down(&cpucontrol);
+ mutex_lock(&cpucontrol);
}
/*
@@ -56,7 +56,7 @@ void unlock_cpu_hotplug(void)
{
if (--lock_cpu_hotplug_depth == 0) {
lock_cpu_hotplug_owner = NULL;
- up(&cpucontrol);
+ mutex_unlock(&cpucontrol);
}
}
EXPORT_SYMBOL_GPL(unlock_cpu_hotplug);
commit 1fb00c6cbd8356f43b46322742f3c01c2a1f02da
Author: Ingo Molnar <mingo@elte.hu>
Date: Mon Jun 26 00:24:31 2006 -0700
[PATCH] work around ppc64 bootup bug by making mutex-debugging save/restore irqs
It seems ppc64 wants to lock mutexes in early bootup code, with interrupts
disabled, and they expect interrupts to stay disabled, else they crash.
Work around this bug by making mutex debugging variants save/restore irq
flags.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
diff --git a/kernel/mutex-debug.c b/kernel/mutex-debug.c
index f4913c376950..036b6285b15c 100644
--- a/kernel/mutex-debug.c
+++ b/kernel/mutex-debug.c
@@ -153,13 +153,13 @@ void show_held_locks(struct task_struct *filter)
continue;
count++;
cursor = curr->next;
- debug_spin_lock_restore(&debug_mutex_lock, flags);
+ debug_spin_unlock_restore(&debug_mutex_lock, flags);
printk("\n#%03d: ", count);
printk_lock(lock, filter ? 0 : 1);
goto next;
}
- debug_spin_lock_restore(&debug_mutex_lock, flags);
+ debug_spin_unlock_restore(&debug_mutex_lock, flags);
printk("\n");
}
@@ -316,7 +316,7 @@ void mutex_debug_check_no_locks_held(struct task_struct *task)
continue;
list_del_init(curr);
DEBUG_OFF();
- debug_spin_lock_restore(&debug_mutex_lock, flags);
+ debug_spin_unlock_restore(&debug_mutex_lock, flags);
printk("BUG: %s/%d, lock held at task exit time!\n",
task->comm, task->pid);
@@ -325,7 +325,7 @@ void mutex_debug_check_no_locks_held(struct task_struct *task)
printk("exiting task is not even the owner??\n");
return;
}
- debug_spin_lock_restore(&debug_mutex_lock, flags);
+ debug_spin_unlock_restore(&debug_mutex_lock, flags);
}
/*
@@ -352,7 +352,7 @@ void mutex_debug_check_no_locks_freed(const void *from, unsigned long len)
continue;
list_del_init(curr);
DEBUG_OFF();
- debug_spin_lock_restore(&debug_mutex_lock, flags);
+ debug_spin_unlock_restore(&debug_mutex_lock, flags);
printk("BUG: %s/%d, active lock [%p(%p-%p)] freed!\n",
current->comm, current->pid, lock, from, to);
@@ -362,7 +362,7 @@ void mutex_debug_check_no_locks_freed(const void *from, unsigned long len)
printk("freeing task is not even the owner??\n");
return;
}
- debug_spin_lock_restore(&debug_mutex_lock, flags);
+ debug_spin_unlock_restore(&debug_mutex_lock, flags);
}
/*
diff --git a/kernel/mutex-debug.h b/kernel/mutex-debug.h
index fd384050acb1..a5196c36a5fd 100644
--- a/kernel/mutex-debug.h
+++ b/kernel/mutex-debug.h
@@ -46,21 +46,6 @@ extern void mutex_remove_waiter(struct mutex *lock, struct mutex_waiter *waiter,
extern void debug_mutex_unlock(struct mutex *lock);
extern void debug_mutex_init(struct mutex *lock, const char *name);
-#define debug_spin_lock(lock) \
- do { \
- local_irq_disable(); \
- if (debug_mutex_on) \
- spin_lock(lock); \
- } while (0)
-
-#define debug_spin_unlock(lock) \
- do { \
- if (debug_mutex_on) \
- spin_unlock(lock); \
- local_irq_enable(); \
- preempt_check_resched(); \
- } while (0)
-
#define debug_spin_lock_save(lock, flags) \
do { \
local_irq_save(flags); \
@@ -68,7 +53,7 @@ extern void debug_mutex_init(struct mutex *lock, const char *name);
spin_lock(lock); \
} while (0)
-#define debug_spin_lock_restore(lock, flags) \
+#define debug_spin_unlock_restore(lock, flags) \
do { \
if (debug_mutex_on) \
spin_unlock(lock); \
@@ -76,20 +61,20 @@ extern void debug_mutex_init(struct mutex *lock, const char *name);
preempt_check_resched(); \
} while (0)
-#define spin_lock_mutex(lock) \
+#define spin_lock_mutex(lock, flags) \
do { \
struct mutex *l = container_of(lock, struct mutex, wait_lock); \
\
DEBUG_WARN_ON(in_interrupt()); \
- debug_spin_lock(&debug_mutex_lock); \
+ debug_spin_lock_save(&debug_mutex_lock, flags); \
spin_lock(lock); \
DEBUG_WARN_ON(l->magic != l); \
} while (0)
-#define spin_unlock_mutex(lock) \
+#define spin_unlock_mutex(lock, flags) \
do { \
spin_unlock(lock); \
- debug_spin_unlock(&debug_mutex_lock); \
+ debug_spin_unlock_restore(&debug_mutex_lock, flags); \
} while (0)
#define DEBUG_OFF() \
diff --git a/kernel/mutex.c b/kernel/mutex.c
index 5449b210d9ed..7043db21bbce 100644
--- a/kernel/mutex.c
+++ b/kernel/mutex.c
@@ -125,10 +125,11 @@ __mutex_lock_common(struct mutex *lock, long state __IP_DECL__)
struct task_struct *task = current;
struct mutex_waiter waiter;
unsigned int old_val;
+ unsigned long flags;
debug_mutex_init_waiter(&waiter);
- spin_lock_mutex(&lock->wait_lock);
+ spin_lock_mutex(&lock->wait_lock, flags);
debug_mutex_add_waiter(lock, &waiter, task->thread_info, ip);
@@ -157,7 +158,7 @@ __mutex_lock_common(struct mutex *lock, long state __IP_DECL__)
if (unlikely(state == TASK_INTERRUPTIBLE &&
signal_pending(task))) {
mutex_remove_waiter(lock, &waiter, task->thread_info);
- spin_unlock_mutex(&lock->wait_lock);
+ spin_unlock_mutex(&lock->wait_lock, flags);
debug_mutex_free_waiter(&waiter);
return -EINTR;
@@ -165,9 +166,9 @@ __mutex_lock_common(struct mutex *lock, long state __IP_DECL__)
__set_task_state(task, state);
/* didnt get the lock, go to sleep: */
- spin_unlock_mutex(&lock->wait_lock);
+ spin_unlock_mutex(&lock->wait_lock, flags);
schedule();
- spin_lock_mutex(&lock->wait_lock);
+ spin_lock_mutex(&lock->wait_lock, flags);
}
/* got the lock - rejoice! */
@@ -178,7 +179,7 @@ __mutex_lock_common(struct mutex *lock, long state __IP_DECL__)
if (likely(list_empty(&lock->wait_list)))
atomic_set(&lock->count, 0);
- spin_unlock_mutex(&lock->wait_lock);
+ spin_unlock_mutex(&lock->wait_lock, flags);
debug_mutex_free_waiter(&waiter);
@@ -203,10 +204,11 @@ static fastcall noinline void
__mutex_unlock_slowpath(atomic_t *lock_count __IP_DECL__)
{
struct mutex *lock = container_of(lock_count, struct mutex, count);
+ unsigned long flags;
DEBUG_WARN_ON(lock->owner != current_thread_info());
- spin_lock_mutex(&lock->wait_lock);
+ spin_lock_mutex(&lock->wait_lock, flags);
/*
* some architectures leave the lock unlocked in the fastpath failure
@@ -231,7 +233,7 @@ __mutex_unlock_slowpath(atomic_t *lock_count __IP_DECL__)
debug_mutex_clear_owner(lock);
- spin_unlock_mutex(&lock->wait_lock);
+ spin_unlock_mutex(&lock->wait_lock, flags);
}
/*
@@ -276,9 +278,10 @@ __mutex_lock_interruptible_slowpath(atomic_t *lock_count __IP_DECL__)
static inline int __mutex_trylock_slowpath(atomic_t *lock_count)
{
struct mutex *lock = container_of(lock_count, struct mutex, count);
+ unsigned long flags;
int prev;
- spin_lock_mutex(&lock->wait_lock);
+ spin_lock_mutex(&lock->wait_lock, flags);
prev = atomic_xchg(&lock->count, -1);
if (likely(prev == 1))
@@ -287,7 +290,7 @@ static inline int __mutex_trylock_slowpath(atomic_t *lock_count)
if (likely(list_empty(&lock->wait_list)))
atomic_set(&lock->count, 0);
- spin_unlock_mutex(&lock->wait_lock);
+ spin_unlock_mutex(&lock->wait_lock, flags);
return prev == 1;
}
diff --git a/kernel/mutex.h b/kernel/mutex.h
index 00fe84e7b672..069189947257 100644
--- a/kernel/mutex.h
+++ b/kernel/mutex.h
@@ -9,8 +9,10 @@
* !CONFIG_DEBUG_MUTEXES case. Most of them are NOPs:
*/
-#define spin_lock_mutex(lock) spin_lock(lock)
-#define spin_unlock_mutex(lock) spin_unlock(lock)
+#define spin_lock_mutex(lock, flags) \
+ do { spin_lock(lock); (void)(flags); } while (0)
+#define spin_unlock_mutex(lock, flags) \
+ do { spin_unlock(lock); (void)(flags); } while (0)
#define mutex_remove_waiter(lock, waiter, ti) \
__list_del((waiter)->list.prev, (waiter)->list.next)
commit cfc736564fd01ee008d746913b1bbb90e3eb1f99
Author: Ingo Molnar <mingo@elte.hu>
Date: Mon Jun 26 15:26:13 2006 +0100
[ARM] fix drivers/mfd/ucb1x00-core.c IRQ probing bug
While reviewing the IRQ autoprobing code i found the attached buglet.
probe_irq_on()/off() calls must always be in pairs, because the generic IRQ
code uses a global semaphore to serialize all autoprobing activites.
(which does make sense) The ARM code's probe_irq_*() implementation does
not do this, but if this driver is ever used on another platform, this bug
might bite.
(It probably does not trigger in practice, because a zero probing mask
returned should be rare - but still.)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
diff --git a/drivers/mfd/ucb1x00-core.c b/drivers/mfd/ucb1x00-core.c
index aff83f966803..c8426a9bf273 100644
--- a/drivers/mfd/ucb1x00-core.c
+++ b/drivers/mfd/ucb1x00-core.c
@@ -420,8 +420,10 @@ static int ucb1x00_detect_irq(struct ucb1x00 *ucb)
unsigned long mask;
mask = probe_irq_on();
- if (!mask)
+ if (!mask) {
+ probe_irq_off(mask);
return NO_IRQ;
+ }
/*
* Enable the ADC interrupt.